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Preface 


This  book  contains  the  proceedings  of  an  International  Symposium  on 
Algorithms  for  Approximation  Four  (A4A4),  held  at  University  of  Hudder¬ 
sfield  from  July  15th  to  20th,  2001,  and  attended  by  106  people  from  no 
less  than  32  countries.  The  accommodation  base  was  the  attractive  Uni¬ 
versity  Park  at  Storthes  Hall,  where  social  events  were  centred.  There  was 
a  very  friendly  atmosphere,  helped  by  the  presence  of  a  significant  number 
of  younger  people  to  balance  the  stalwarts.  Food  was  excellent  and  weather 
was  generally  good. 

This  was  the  fourth,  after  a  pause  of  9  years,  in  the  series  of  ”  Al¬ 
gorithms  for  Approximation”  meetings  held  before  in  Oxfordshire  in  1985, 
1988,  1992,  and  once  again  it  was  run  under  the  sponsorship  of  US  Air 
Force  (European  Office  of  Aerospace  Research  and  Development)  and  this 
time  with  grants  from  London  Mathematical  Society  and  National  Physical 
Laboratory  (NPL)  (Software  Support  for  Metrology). 

The  Organising  Committee  consisted  of  Iain  Anderson,  John  Mason, 
David  Turner  (Huddersfield)  Maurice  Cox  and  Alistair  Forbes  (NPL)  and 
Jeremy  Levesley  and  Will  Light  (Leicester).  In  addition  to  them,  the  Pro¬ 
gramme  Committee  included  Claude  Brezinski  (Lille),  Martin  Buhmann 
(Giessen),  Tim  Goodman  (Dundee),  Tom  Lyche  (Oslo),  Alistair  Watson 
(Dundee)  and  Larry  Schumaker  (Vanderbilt).  In  support  of  the  committee, 
the  Symposium  Secretary,  Ros  Hawkins  was  extremely  efficient,  and  was 
helped  by  Karen  Mitchell. 

Moving  to  the  academic  programme,  there  were  11  invited  speakers. 
From  UK  were  Maurice  Cox,  Tim  Goodman,  Alistair  Watson  and  Will 
Light;  from  other  parts  of  Europe  were  Martin  Buhmann  (Giessen),  Mi¬ 
chael  Floater  (SINTEF  Oslo),  Lars  Nielsen  (Danish  Institute  of  Fundamental 
Metrology)  and  Gerlind  Plonka  (Duisburg);  from  USA  were  Tomaso  Poggio 
(MIT)  and  Larry  Schumaker  (Vanderbilt);  and  from  South  Africa,  Kathy 
Driver  (Witwatersrand). 

In  addition  there  were  74  submitted  papers  given  at  the  meeting,  of  which 
a  good  proportion  were  offered  in  Special  Sessions  in  Metrology-Maths  (run 
by  David  Turner),  Metrology-Stats  (Alistair  Forbes),  Orthogonal  Polyno¬ 
mials  and  Pade  Approximation  (Claude  Brezinski  and  Peter  Graves-Morris 
(Bradford)),  Spline  Functions  (Tom  Lyche),  Mathematical  Modelling  in 
Medicine  (Ewald  Quak),  Integrals  and  Integral  Equations  (Ezio  Venturino 
(Torino))  and  Wavelets  (Richard  Zalik  (Auburn)). 

The  current  volume  contains  a  substantial  portion  of  the  papers  from 
the  conference,  which  were  provided  by  the  speakers,  so  that  this  is  a  solid 


and  broad  contribution  to  the  area.  The  book  has  been  organised  in  topics 
to  suit  the  final  selection  of  papers. 

All  submitted  papers  were  refereed  and  significant  modifications  were 
made  to  a  number  of  papers.  In  general,  there  was  a  high  standard  of 
submissions. 

We  cannot  conclude  this  preface  without  mentioning  the  celebration  of 
three  60th  birthdays  of  2001  at  the  meeting,  namely  those  of  Claude  Brez- 
inski,  Maurice  Cox,  and  John  Mason.  All  played  major  parts  in  the  Sym¬ 
posium. 

We  must  finish  by  offering  thanks  to  all  the  staff  at  University  of  Hud¬ 
dersfield,  NPL,  USAF-EOARD,  London  Mathematical  Society,  and  the  pub¬ 
lishers,  who  contributed  to  this  most  successful  and  memorable  symposium. 

Thanks  also  go  to  Jeremy  Levesley  and  Iain  Anderson  and  the  publishers, 
who  worked  so  hard  on  the  proceedings,  and  to  all  authors  without  whom 
the  volume  would  not  exist. 


John  Mason 
Huddersfield 
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Abstract 

A  strategy  to  construct  a  grid  conforming  to  the  boundaries  of  a  prescribed  domain  by 
using  transfinite  interpolation  methods  is  discussed.  A  transfinite  interpolation  procedure 
is  combined  with  a  B-spline  tensor  product  scheme  defined  by  using  suitable  control 
points.  Their  choice  is  performed  by  taking  into  account  a  quality  measure  parameter 
based  on  the  condition  number  of  matrices  linked  to  the  covariant  metric  tensors. 


1  Introduction 

The  algebraic  grid  generation  approach  relies  on  the  construction  of  a  coordinate  trans¬ 
formation  from  the  computational  domain  into  the  physical  domain.  In  particular,  this 
can  be  obtained  through  transfinite  interpolating  operators  allowing  us  the  generation 
of  grids  with  boundary  conformity.  Furthermore,  using  a  Hermite-type  transfinite  in¬ 
terpolating  scheme  we  can  obtain  orthogonal  grid  lines  emanating  from  the  boundary. 
This  can  be  very  important  for  practical  reasons  since  the  grid  point  distribution  in  the 
immediate  neighborhood  of  the  boundaries  has  a  strong  influence  on  the  accuracy  of  the 
numerical  solution  of  partial  differential  equations  [5].  Furthermore,  in  case  a  domain 
decomposition  is  necessary  the  orthogonality  guarantees  smoother  grids.  In  order  to  ob¬ 
tain  a  grid  with  other  specified  properties,  e.g.  the  control  of  the  shape  and  position  of 
the  coordinate  curves,  transfinite  interpolating  methods  can  be  combined  with  tensor 
product  schemes  using  suitably  chosen  control  points  (see  for  instance  [1,  2,  6,  7,  8]). 
Even  though  this  type  of  algebraic  method  is  computationally  efficient,  to  define  work¬ 
able  meshes,  a  significant  amount  of  user  interaction  is  required  for  the  selection  of  the 
control  points  involved  in  the  tensor  product.  To  overcome  this  drawback,  an  automatic 
strategy  for  choosing  the  control  points  turns  out  to  be  desirable.  Here,  following  the 
approach  first  discussed  in  [1],  we  present  an  algebraic  Hermite-type  transfinite  method 
to  construct  a  grid  interpolating  the  boundary  and  its  normal  derivatives.  In  fact,  given 
a  “quadrilateral”  domain  Q  C  1R2,  a  transformation  G  :  R  =  [0, 1]  x  [0, 1]  —►  Q  is  defined 
as 

G(s,t)  :=  TP(s,t)  +  (P1®P2)([<f>91>]-Tp){s1t)  (1.1) 

where  Tp  is  a  tensor  product  surface  i.e.  Tp(s,t)  :=  YliLi  EjLi  QijPiMs)^j^) 

Bi^ 3  denoting  the  usual  cubic  B-spline,  <j>  and  xj)  are  boundary  curves  and  (Pj  ®  P2)  is  the 
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Boolean  sum  of  Hermite-type  blending  function  linear  operators.  The  set  Q  =  {Qij,  i  = 
1, . .  • ,  m,  j  =  1, . . . ,  n}  is  the  set  of  control  points. 

As  already  noted,  the  choice  of  the  control  points  is  a  crucial  matter.  In  this  paper 
we  take  into  account  a  grid  quality  measure  parameter  for  their  selection.  In  particular, 
the  proposed  automatic  procedure  relies  on  the  fact  that  some  grid  properties  can  be 
described  in  terms  of  the  condition  number  of  matrices  linked  to  the  covariant  metric 
tensors  [4] .  Therefore,  the  control  points  are  chosen  minimizing  their  condition  number. 

The  outline  of  this  paper  is  as  follows.  In  Section  2,  the  transformation  (1.1)  is  given 
in  detail  and  its  properties  are  investigated.  In  Section  3,  a  way  for  choosing  the  control 
points  is  proposed  relying  on  a  particular  quality  measure  parameter.  Finally,  in  Section 
4  some  numerical  results  are  presented  to  illustrate  the  features  of  the  proposed  strategy. 

2  The  transformation 

In  this  section  the  transformation  (1.1)  is  characterized.  Let  us  consider  a  “quadrilateral” 
domain  Q  C  1R2  such  that  50  =  uf=1dQ *,  with  50i,  dO,2,  5O3, 5O4  being  the  supports  of 
four  regular  curves  7*  :  [0, 1]  50 *,  i  =  1, . . . ,  4  taken  counterclockwise.  Furthermore, 
let  us  suppose  that  50i  fl  5O3  =  0  and  5O2  PI  504  =  0,  with  any  other  intersection 
occuring  only  at  the  end  points  of  the  boundary  curves.  In  particular,  the  following 
compatibility  conditions  are  assumed 

7i(0)  =  74(1),  7i(l)  =  72(0),  72(1)  =  73(0),  74(0)  =  73(1)  • 

For  later  convenience,  we  set  0x(s)  :=  71(5),  <j>2($)  :=  73(1  -  s)  denoting  by  5  the 
curve  parameter  running  on  [0, 1]  and  we  set  0i(£)  :=  74(1  -  t),  02(t)  :=  72 (t)  denoting 
by  t  the  curve  parameter  running  on  [0, 1],  In  addition,  the  components  of  the  0-curves 
and  0-curves  are  denoted  by  0x,0y  and  0a:,0y  respectively. 

Next,  we  define  four  additional  curves  by  computing  the  derivatives  of  the  0  and 
0-curves,  i.e., 


&+2O)  =  («(»))')  .  •  i  =  1, 2  , 

'  (2-1) 
Mt)  =  wm')  -  j  =  v 2 , 

with  C  a  constant  value  also  depending  on  the  curve  orientations  and  with  ||  •  ||2  the 
Euclidean  norm.  Then,  we  introduce  the  linear  operators 

Pi[4>](s,t)  :=  X!i=l  )  .  P2bl>](s,t)  :=  Y?j= 1  0)1/7 M  > 

(2.2) 

where  U\  =  0,  u2  =  1.  The  functions  c^,  i  =  1, ...  ,4,  are  the  dilated  versions  of  the 


C Conti,  R.  Morandi  and  D.  Scaramelli 


classical  Hermite  bases  with  support  on  [0,  u]  and  on  [1  -  u ,  1]  being  0  <  u  <  1,  i.e. 
ai(s):=(l  +  2f)(l-f)2  ,  a3(s)  :=  s(l  -  f )2  ,  s€[0,u], 

a2(S):=(3-2^i)(^i)2  ,  ft4(s)  :=  (s  -  lK^)2  ,  «  €  [1  -  0, 1]. 

(2.3) 

The  Boolean  sum  operator  (Pi  ©  P2)  =  Pi  +  P2  -  Pi P2  provides  the  blending  function 
surface 

B(s,t)  :=  (Px  ©  P2)[<£,i/>](M)  =  Pi[4>\(s,t)  +  P2Ms,t)  -  PiftfolWM)  •  (2-4) 

It  is  known  that  B  satisfies 

B(uht)  =  ,  i  =  1,2  4  =  3,4,  (25s 

B(s,wj)  =  <t>j(s)  ,  j  =  1,2  =  0j(a)  ,  j  =  3, 4  , 

where  wi  =  W3  =  0,  W2  =  U4  =  1  and  =  0,  w2  =  104  =  1.  It  is  worthwhile  to 

remark  that,  as  we  are  dealing  with  orthogonal  grid  lines  emanating  from  the  boundary 
of  the  domain,  the  intersecting  boundary  curves  must  be  also  orthogonal.  Thus,  the 
following  additional  conditions  are  assumed: 

0)  =  ip'liwi),  <j)i+2{  1)  =  i/4(wi)  , 

«/’»+ 2(0)  =</>i(«i),  ^i+2(l)  =  02(«i)>  i  =  l,2.  (2.6) 

<!>"{  0)  =ip'i{wi),  =ip2(v>i), 

Now,  in  order  to  define  a  suitable  grid,  following  the  approach  given  in  [1],  we  use 
the  linear  transformation  G 

G(s , t)  :=  TP(s ,  t)  +  (P1  0  P2) ([<£,  ^]-TP) (s, t)  (2.7) 

where  TP(s,t)  :=  EHi  E"=i  QijBiAs)BjAt)  with  ^<,3  denoting  the  usual  cubic  B- 
splines  with  uniform  knots.  The  set  Q  =  ,  i  —  1, . . . ,  m,  j  =  1, . . .  ,  n}  is  a  suitable 

set  of  control  points  whose  definition  is  discussed  in  Section  3.  It  should  be  noted  that 
in  (2.7)  the  Boolean  sum  operator  is  also  acting  on  a  surface  TP(s,  t).  In  this  case 
(2.2)  is  used  taking  the  eight  boundary  curves  TP(0,i),  TP(1,£),  TP(s,0),  TP(s,  1), 

dTp(0,t)  dTp(l,t )  dTP{s,  0)  dTp(s,  1) 

ds  >  ds  »  at  »  at  * 

It  is  easy  to  show  that  G  still  satisfies  G(tt/,t)  =  i/>i(t)  ,  z  =  1,2,  =  rj)i{t)  ,  i  = 

3,4,  G(s,^)  =  ^(s)  ,  j  =  1,2  and  ,  j  =  3,4.  Furthermore,  because 

of  the  locality  of  the  blending  functions  a7,  i  -  1, ...  ,4,  the  control  of  the  coordinate 
lines  obtained  by  means  of  the  evaluation  of  G  over  a  parameter  set  in  the  interior 
of  the  domain  is  mainly  based  on  the  contribution  of  Tp.  This  fact  and  the  use  of  B- 
splines  ensures  the  convex-hull  property  in  the  interior  of  the  domain.  This  property  is  of 
importance  in  numerical  grid  generation  to  locate  the  grid  with  respect  to  the  position 
of  control  points. 
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3  Grid  quality  measure 

It  is  well  known  that  grid  generation  techniques  sensible  to  grid  quality  features  are 
particularly  attractive.  Thus,  in  this  section,  we  discuss  a  strategy  to  choose  the  set  Q 
of  control  points  based  on  a  suitable  grid  quality  measure  parameter. 

Given  a  set  of  grid  points  Q  := {Gij}^’^  defining  the  quadrilateral  cells 
quality  measures  can  commonly  include:  grid  “skewness”,  measuring  the  departure  of 
Cij  from  a  rectangle,  grid  “aspect  ratio”,  measuring  the  departure  of  from  a  rhombus 
or  grid  “conformality”,  measuring  the  departure  of  Cij  from  a  square  (see  for  instance 


Here,  as  done  in  [4]  for  the  case  of  unstructured  grids,  we  define  a  grid  quality  measure 
taking  into  account  the  condition  number  of  particular  matrices  derived  from  the  grid. 
As  explained  below,  somehow  this  quality  parameter  measures  the  departure  of  from 
a  square. 

The  strategy  starts  with  a  set  Qz  of  control  points  obtained  by  evaluating  on  a  coarse 
parameter  set  <SC  =  {(s*,  a  Lagrange  blending  function  surface  (for  detail 

related  to  Lagrange  blending  function  methods  we  refer,  for  instance,  to  [3])  by  working 
only  with  the  four  boundary  curves  of  the  given  domain.  Then,  using  Ql  a  first  grid  is 
obtained  by  evaluating  the  surface  G  in  (2.7)  on  a  fine  parameter  set  Sf  =  {(s»,ij)}^= \ 
obtaining  the  grid  points 

a  :=  {Gitj  =  (Gl^Gy  =  G{sut&  i  -  1,...5M,  j  =  l,...,iV}. 

The  set  Q  is  then  used  to  define  (M  -  1)  x  (N  -  1)  bidimensional  matrices  associated 
with  the  (M  -  1)  x  (N  -  1)  quadrilateral  cells  'Cij ,  i  =  1, . . . ,  M  —  1,  j  =  1, . . . ,  N  -  1. 
These  matrices  are  defined  as 


rix  rix 

r>v  ny 


rx  rix 

r*y  _  r*y 


,  i  =  l,...,M-l,  j  =  1, ...  ,N  —  1  (3.1) 


and  their  condition  number  K(Aij)  is  related  to  the  stretch  of  the  cells.  In  fact,  it  is  easy 
to  prove  that  K{A^j)  :=  ||A^-||2  •  ||A“j||2  =  1  if  and  only  if  we  are  dealing  with  a  cell 
Cij  where  the  three  points  Gij+i,  Gij,  Gi+ij  generate  half  a  square  [9].  On  the  other 
hand,  in  order  to  involve  all  the  grid  points  in  the  quality  measure  it  is  also  convenient 
to  define  the  boundary  matrices 


i  =  1, . . . ,  M 


rix 

^i+l,N  ' 

six 

~  Ui,N 

rix 

GIn  \ 

-G% 

G\n- 1  “ 

UMJ+ 1 

rix 

~ 

j— 1 

1 

-  Gmj  ) 

ny 

UM,j+ 1 

-  Cij 

1— 1 

1 

GVM,j  ) 

rix 

UM,N  " 

rix 

'  UM- 1, 

N  G%f,N 

rix 

~  ^M,N- 1 

riy 

UM,N 

ny 

'  UM- 1, 

N  & VM,N 

“  1 

au  /  ^M,N  M—1,N  ^ M,N  ^M,N- 1  \ 

^M-l^N-l  '—l  ny  _  nv  nv  _  ny  I  5 

\  UM-1,N  UM,N  uM,iV-l  / 

so  that  the  boundary  points  are  also  taken  into  account. 

Next,  we  modify  the  initial  set  Q1  of  control  points  minimizing  the  following  objective 
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function 


u  =  wn  (e":1  e"7  K(Aid) + e;^1  ^U-i)+ 

E"V  *(^-1  j)  +  *W-l,N-l))  • 


(3.3) 


The  minimization  is  done  with  respect  to  the  control  points  under  suitable  constraints 
on  their  coordinates  depending  on  the  geometry  of  the  domain  f I  This  is  the  only  user 
interaction  required. 

Obviously,  since  ideal  inner  cells  are  characterized  by  an  associated  matrix  Aij  having 
a  condition  number  close  to  one,  the  optimal  distribution  of  the  control  points  should 
guarantee  ming  f0b&  1-  On  the  other  hand,  ming  fDb  strongly  depends  on  the  geometry 
of  the  domain  (for  example  in  case  of  a  squared  domain  the  optimal  value  is  ming  fQb  —  1 
while,  in  general,  this  value  is  not  reached). 


Summary  of  the  Method 

(1)  Compute  the  initial  set  of  control  points  Ql  by  means  of  a  Lagrange  blending  function 
method  using  the  four  given  boundary  curves, 

(2)  Compute  the  initial  grid  Q{  =  {G(si,tj),  i  =  1, . . . ,  M,  j  =  1, . . . ,  N}  with  G  given 
in  (2.7)  by  using  the  set  of  control  points  Q\ 

(3)  Minimize  the  objective  function  (3.3)  so  defining  a  new  set  of  control  points  Qf , 

(4)  Compute  the  final  grid  Q f  =  {(7(si,  tj),  i  =  1, . . . ,  M ,  j  —  1, . . . ,  N}  with  G  given 
in  (2.7)  by  using  the  set  of  control  points  Qf  with  M  »  M,  N  N. 

Remark  3.1  We  note  that,  in  order  to  reduce  the  computational  cost  of  the  minimiz¬ 
ation  procedure ,  the  integers  M  and  N  are  chosen  less  than  M  and  N. 


4  Numerical  Results 

We  conclude  the  paper  giving  some  numerical  results  testing  the  properties  of  the  trans¬ 
formation  G  and  showing  the  performance  of  the  proposed  approach. 

Three  domains  are  considered.  For  each  of  them  we  present  the  initial  grid  obtained 
by  the  transformation  G  using  the  initial  set  of  control  points  Ql  and  the  final  grid 
obtained  using  the  set  of  control  points  Qf  resulting  from  the  minimization  procedure. 
In  all  the  figures  the  control  points  are  denoted  by  the  symbol  4*\  The  minimization 
problem  is  solved  by  using  a  sequential  quadratic  programming  method  i.e.  by  using  the 
routine  constr  of  the  Optimization  toolbox  of  the  Matlab  package.  In  the  minimization 
procedure,  the  constraints  on  the  control  points  Qf  are  chosen  so  that  some  geometric 
properties  of  the  domain,  such  as  symmetry  and  convexity,  are  preserved.  Furthermore, 
in  all  the  examples  M  and  N  are  equal  to  M  and  to  N.  The  values  of  the  objective 
function  before  the  minimization  (/^b)  and  after  the  minimization  (/^)  are  also  given  in 
the  figure  captions. 

The  first  and  the  second  test  display  a  “waterway”  grid  and  a  ?7-shaped  grid  with 
their  control  points  before  and  after  the  minimization  procedure.  The  effectiveness  of 
the  method  is  evident. 
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Figure  3  shows  a  grid  composed  of  six  sub-grids,  obtained  via  a  domain  decompos¬ 
ition  approach.  In  this  case,  the  Hermite-type  interpolation  method  guarantees  a  C 1 
connection  among  the  patches.  Here,  the  two  values  of  foh  and  f*b  in  the  figure  captions 
refer  to  the  “horizontal”  and  “slanted”  grids,  respectively. 
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Abstract 

We  deal  with  introducing  a  new  algorithm  for  solving  the  optimal  shape  problems  in 
which  they  are  defined  with  respect  to  a  pair  of  geometrical  elements.  The  problem  is 
to  find  the  optimal  domain  approximately  for  a  given  functional  that  is  involved  wit 
the  solution  of  a  linear  or  nonlinear  elliptic  equation  with  a  boundary  condition  over 
a  domain  The  Shape-Measure  method,  in  Cartesian  coordinates,  will  be  used  to  nn 
the  nearly  optimal  solution  in  two  steps.  By  transferring  the  problem  into  a  measure- 
theoretical  form,  first  we  will  find  the  solution  of  the  elliptic  prob  em  for  a  given  domain 
by  using  the  embedding  method.  Then  the  Shape-Measure  method  will  be  applied  to  find 
the  best  domain  approximately.  An  example  will  be  given. 


1  Introduction  and  Problem 

Consider  the  optimal  shape  (optimal  shape  design)  problems  in  which  they  are  defined 
with  respect  to  a  pair  of  geometrical  elements;  this  pair  consists  of  a  measurable  set 
(in  ZR2),  which  can  be  regarded  as  a  domain,  and  a  simple  closed  curve  containing 
a  given  point,  which  is  the  boundary  of  the  set.  By  considering  the  property  for  the 
desired  curves  to  be  simple,  the  problem  depends  on  the  geometry  which  is  used.  In 
polar  coordinates,  we  solved  the  similar  problem  in  [1].  But  in  Cartesian  coordinates, 
it  is  difficult  to  introduce  a  linear  condition  which  determines  the  property  of  a  closed 
curve  being  simple.  Thus  here  we  consider  some  limitation  on  shape  in  order  to  make 
sure  that  it  is  simple.  The  problem  will  be  solved  in  two  stages.  First,  by  use  of  measures, 
the  value  of  the  objective  function  will  be  calculated  for  any  given  domain.  Then  the 
optimal  domain  will  be  obtained  by  use  of  optimization  techniques. 

Let  D  C  ZR2  be  a  bounded  domain  with  a  piecewise-smooth,  closed  and  simple 
boundary  dD.  We  assume  that,  some  part  of  dD  is  fixed  and  the  rest,  T.  with  the  given 
initial  and  final  points  A  and  B  respectively,  is  not  fixed.  Here  we  suppose  that  the  fixed 
part  of  dD  is  made  by  three  segments,  parts  of  lines  y  =  0,x  =  0  and  y  -  1  between 
points  .4(1,0),  (0,0),  (0,1),  B(l,l)  (see  Figure  1). 

Thus  T  is  chosen  as  an  appropriate  variable  curve  joining  A  and  B  so  that  u  is 
well-defined.  Let  u(X)  (X  =  ( x,y )  G  ZR2)  be  a  bounded  solution  of  the  following  elliptic 
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Fig.  1.  An  admissible  domain  D  under  the  assumptions  of  the  numerical  work. 


equation: 

Ati(X)  +  f(X ,  u)  =  v(X)  ,  u,aD  =  0,  (1.1) 

where  X  €  D  — >  v(X)  £  1R  is  a  bounded  real  function  (?;  also  can  be  considered  as  a 
fixed  control  function) ;  the  function  /  is  assumed  to  be  a  bounded  and  continuous  real¬ 
valued  function  in  L2(-D  x  IR).  Moreover  the  above  domain  D  is  called  an  admissible  if 
the  equation  (1.1)  has  a  bounded  solution  on  D;  we  denote  by  V  as  the  set  of  all  such 
admissible  domains.  We  are  going  to  solve  the  problem  of  minimizing  the  functional 
1(D)  —  fD  fQ(X,u)  dX,  on  the  set  V  where  f0  is  a  given  continuous,  nonnegative,  real¬ 
valued  function  on  D  x  IR.  To  calculate  the  value  of  1(D)  for  a  given  domain  D,  it  is 
necessary  first  to  identify  the  solution  of  (1.1). 

2  Weak  solution  and  metamorphosis 

In  general,  it  is  difficult  to  identify  a  classical  solution  for  the  problem  like  (1.1)  and 
usually  one  tries  to  find  a  weak  (generalized)  solution  of  them.  Hence  the  variational 
form  of  (1.1)  is  introduced  in  the  following;  we  remind  the  reader  that  Hq(D)  = 
{'ip  £  Drl(D)  :  ip\QD  =  0},  where  Hl(D )  is  the  Sobolev  space  of  order  1. 

Proposition  2.1  Let  u  be  the  classical  solution  of  (LI),  then  we  have  the  following 
equality: 

[  (uAip  +  'iPf)dX=  [  iPvdX,  V^GflJ(D).  (2.1) 

Jd  Jd 

Proof:  Multiplying  (1.1)  by  the  function  ip  e  Hq  (D)  and  then  integrating  over  D,  with 
use  of  the  Green’s  formula  (see  [3])  gives  JD(uAip  +  ipf  -  tpv)  dX  =  /0d(V,|k  “  u §^)  dS, 
where  n  is  the  unit  vector  normal  to  the  boundary  dD  and  directed  outward  with  respect 
to  D.  Because  ip\QD  =  0  and  U|aD  =  0,  then  (2.1)  is  satisfied.  □ 
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Definition  2.2  A  function  u  G  H 1(D)  is  called  a  bounded  weak  solution  of  the  problem 
(LI)  when  it  is  bounded  and  satisfies  the  equality  (2.1)  for  all  %j)  G  Hq(D)  (the  conditions 
for  existence  of  the  weak  solution  of  the  problem  (LI)  and  also  the  boundedness  property 
of  it,  have  been  considered  in  many  references,  like  [3]  and  [2]). 

Now  we  apply  our  new  way  which  is  called  the  Shape- Measure  method.  Let  S7  =  U  x  D, 
where  U  C  1R  is  the  smallest  bounded  set  in  which  the  bounded  weak  solution  u(-) 
takes  values.  Then  by  applying  the  Riesz  Representation  Theorem  ([6]),  a  bounded 
weak  solution  can  be  represented  by  a  positive  Radon  measure;  the  proof  of  the  following 
Proposition  is  similar  to  the  Proposition  3.1  in  [1]. 

Proposition  2.3  Let  u(X)  be  a  bounded  generalized  solution  of  (LI);  there  exist  a 
unique  positive  Radon  measure,  say  fiu,  in  such  that: 

pu(F)=  f  Fdpu  = 

Jvt 

Thus  the  equality  (2.1)  can  be  changed  to  /. iu(F =  7^,  G  H^(D),  where  = 
uAx/j  +  fif)  and  7^  =  JD  'ipv  dX.  Also,  1(D)  =  /iw(/Q).  Because  the  measure  pu  projects 
on  the  (x,y)~ space  as  the  respective  Lebesgue  measure,  we  should  have  pu(0  = 
where  £  :  H  — ►  1R  depends  only  on  variable  X  (i.e.  £  G  Ci(Q)),  and  is  the  Lebesgue 
integral  of  £  over  D.  Therefore  the  original  problem  can  be  described  as  follows: 

To  find  a  measure  pu  G  A/1+(Q)  so  that  it  satisfies  the  following  constraints: 

/i„ (/;■.)  =  Vi/>  e  . 

MO  =  «e»  V4  6  Ci(fi).  (2.3) 

As  Rubio  did  in  [5] ,  to  be  sure  that  we  do  not  miss  any  solution,  we  extend  the  underlying 
space;  instead  of  finding  a  measure  pu  G  «M+(ft),  introduced  by  Proposition  2.3  and 
equalities  (2.3),  we  seek  a  measure  fi  G  Ad+(fi)  which  satisfies  just  the  conditions: 

**(*♦)  =  7*  Vj/>  €  Hq(D)\ 

Kt)  =  v?€C,(n).  (2.4) 

3  Approximation 

The  system  (2.4)  is  linear  because  all  the  functions  in  the  right-hand-side  of  equations 
are  linear  functions  in  their  argument  p.  But  the  number  of  equations  and  the  underlying 
space  are  not  finite.  We  shall  develop  this  system  by  requiring  that  only  a  finite  number 
of  the  constraints  are  satisfied.  This  will  be  achieved  by  choosing  countable  sets  of 
functions  whose  linear  combinations  are  dense  in  the  appropriate  spaces.  But  first  we 
should  approximate  the  unknown  part  of  the  boundary  just  by  the  finite  number  of  its 
points.  This  idea  comes  from  the  approximation  of  a  curve  by  broken  lines.  For  the  given 
D  and  hence  for  the  given  P,  let  Am  =  {xm,ym),m  =  0, 1, 2, . : . ,  M ,  be  a  finite  number 
of  points  on  T  (where  Ao  =  A).  We  link  together  each  pair  of  consecutive  points  Am 
and  Am+\  for  m  —  0, 1, . . . ,  M  —  1  and  close  this  curve  by  joining  the  points  Am  and 
B  together.  Now  the  resulted  shape,  which  is  denoted  by  6Dm,  is  an  approximation  for 
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F(X,u)  dX-,  VFeC(fi). 


(2.2) 


Shape-measure  method 


dD\  we  also  call  the  domain  which  introduced  by  its  boundary  8Dm  as  Dm  (see  Figure 

i). 

It  is  possible  that  by  increasing  M,  the  curve  8Dm  will  become  closer  and  closer  (in 
the  Euclidean  metric)  to  the  curve  dDy  and  hence  one  may  conclude  that  the  minimizer  of 
I  over  T>mi  if  it  exists,  tends  to  the  minimizer  of  I  over  V ,  if  it  exists.  But  some  difficulties 
could  arise  (too  oscillatory  a  curve  may  cause  problems).  Thus,  we  will  fix  the  number  of 
points.  For  a  given  M,  let  the  value  of  the  components  3/1,  j/2>  •  •  • »  2/m»  be  fixed.  Because 
xm  is  a  free  term,  the  point  Am  could  be  anywhere  on  the  line  y  =  x  >  0  for  every 
m  (see  Figure  1).  Therefore  points  Am  and  Am+ 1  can  be  chosen  so  that  they  belong  to 
T  and  hence  the  part  of  T  between  the  lines  y  =  Ym  and  y  =  Ym+ 1  can  be  approximated 
by  the  segment  Amj4m+i.  Hence,  we  do  not  lose  generality.  Thus,  we  fix  the  components 
2/i,  2/2,  -  -  -  j  Vm  with  the  values  Yi,Y2,  . . .  ,1m,  respectively. 

Now  we  introduce  the  set  G  Hq(D)  :i  —  1,2, ...  }  so  that  the  linear  combin¬ 
ations  of  the  functions  {ipi}  are  uniformly  dense  (that  is,  dense  in  the  topology  of 
the  uniform  convergence)  in  Hq(D ).  We  know  that  the  vector  space  of  polynomials 
with  the  variable  x  and  y ,  P(x,y),  is  dense  in  C°°(D)\  therefore  the  set  P0(x,y)  — 
{p(x,y)  G  P(x,y)  \  p(x,y )  =  0 ,V(x,y)  G  dD}  ,  is  dense  (uniformly)  in  {h  G  C°°(D)  : 
h\dD  —  0}  =  Cg°(D)}.  Since  the  set 

Q(x,y)  =  {l,x,y,x2,xy,y2,x3,x2y,xy2,y3, . . .  } 

is  a  countable  base  for  the  vector  space  P(x ,  y),  each  elements  of  P(x ,  y)  and  also  Pq(x,  y), 
is  a  linear  combination  of  the  elements  in  Q(x,y).  By  Theorem  3  of  Mikhailov  [3]  page 
131,  the  space  C°°(D)  is  dense  in  iiT1(Zl);  thus  the  space  Cq°(D)  will  be  dense  in  Hq(D). 
Consequently,  the  space  Po(x,y)  is  uniformly  dense  in  Hq(D).  We  define 

M 

ipi(x,y)=xy(y-l)Yl(x-xi+y-Yi)qi(x,y),  (3.1) 

1  =  1 

where  qt  G  Q{x ,  y).  Therefore  t/?|r  =  0  and  the  set  i(x ,  y)  :  i  =  1, 2, . . .  }  ,  is  total  (dense 

in  the  topology  of  the  uniform  convergence)  in  Hq(D ). 

For  the  second  set  of  functions,  let  L  be  a  given  positive  integer  and  divide  D  into  L 
(not  necessary  equal)  parts  Di,  £>2, . . . ,  Dl  ,  so  that  by  increasing  L  the  area  of  Ds,  s  = 
1, 2, . . . ,  L,  will  be  decreased.  Then,  for  each  s  we  define: 


if  (x,y)  e  Ds, 
otherwise. 


These  functions  are  not  continuous,  but  each  of  them  is  the  limit  of  an  increasing  sequence 
of  positive  continuous  functions,  {£Sfc};  then  if  p  is  any  positive  Radon  measure  on  0, 
m(£s)  =  Hindoo  M(Csfc)-  The  linear  combination  of  functions  {£j  :  j  =  1, 2, . . . ,  L}  for  all 
positive  integer  L,  can  approximate  a  function  in  Ci(Q)  arbitrary  well  (see  [5]  Chapter 
5). 


By  Selecting  just  the  finite  number  of  functions  in  the  mentioned  spaces  the  problem 
(2.4)  can  be  replaced  by  another  one  in  which  we  are  looking  for  the  measure  Pm1,m2  € 
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A4+(H),  so  that  it  satisfies  the  following  constraints: 

FM\ ,  A/2  (-^i)  Tii  ^  1 5  2j .  .  .  ,  A/j  , 

j  =  1,2,...,  M2,  (3.2) 

where  Mi  and  M2  are  two  positive  integers  and  F*  =  F^  ,  7*  =  7^.  ,  aj  =  a^.  If 
we  denote  by  Q(Mi,M2)  the  set  of  positive  Radon  measures  in  At+(fi)  which  satisfy 
equalities  (3.2),  and  also  denote  by  Q  the  set  of  positive  Radon  measures  in  A4+(Q)  which 
satisfy  equalities  (2.4),  one  can  easily  prove  the  following  Proposition  by  considering  the 
proof  of  Proposition  1 1 1.1  in  [5]. 

Proposition  3.1  :  If  Mi, M2  — *  00  then  Q(Mi,M2)  — »  Q;  hence  for  the  large 
enough  numbers  M\  and  M2  the  set  Q  can  be  identified  by  Q(Mi,M2). 

But  even  if  the  number  of  equations  in  (3.2)  is  finite,  the  underlying  space  Q(Mi,  M2) 
is  still  infinite-dimensional.  By  Theorem  A. 5  in  the  Appendix  of  [5],  in  (3.2) 

can  be  characterized  as  (iM\  ta/2  =  with  triples  Zn  e  Vt  and  the 

coefficients  an  >  0  for  n  =  1,2,..., Mi  +  M2,  where  S(z)  6  M+(Q)  is  supposed  to 
be  a  unitary  atomic  measure  with  support  the  singleton  set  {z}.  Thus  the  measure 
problem  is  equivalent  to  a  nonlinear  one  in  which  the  unknowns  are  the  coefficients  an 
and  supports  {Znj.  Proposition  777.3  of  [5]  Chapter  3,  states  that  the  measure 
has  the  following  form 

N 

MA/i,A/2  —  ^  (3-3) 

n—  1 

where  Zn,  n  =  1, 2, . . . ,  TV,  belongs  to  a  dense  subset  of  Q.  Now  let  us  put  a  discretization 
on  0,  with  the  nodes  Zn  =  ( xn,yn,un ),  in  a  dense  subset  of  Q;  then  we  can  set  up  the 
following  linear  system  in  which  the  unknowns  are  the  coefficients  an: 


a„  >  0, 

71  —  1)2,. 

N 

^  OtnF i(Zn )  =  7n 

7  =  1,2,.. 

■ ,  Mi\ 

n— 1 

N 

®n€j(Zn)  =::  aj  ? 

i  =  l,2... 

■\m2. 

(3.4) 

n=l 


The  solution  of  (3.4)  is  not  necessary  unique  (even  if  the  problem  (1.1)  satisfies 
the  necessary  conditions  for  having  a  unique  bounded  weak  solution),  because  of  the 
approximation  scheme. 

4  The  optimal  solution 

The  main  aim  of  the  present  section  is  to  find  an  optimal  domain  D*  £  T>m  so  that 
the  value  of  I (D*)  will  be  the  minimum  on  the  set  T>m.  By  applying  the  result  of  the 
previous  section,  a  solution  of  (1.1)  can  be  found.  Indeed,  it  is  approximated  by  a  solution 
of  the  linear  system  (3.4)  according  to  the  variables,  xm,  m  =  1, 2, . . . ,  M.  As  mentioned, 
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this  solution  is  not  necessary  unique.  Let  us  specify  one  by  solving  the  following  linear 
programming  problem 

N 

Minimize:  ]Pan/0(Zn) 

71=1 

Subject  to  :  an  >  0,  n  =  1, 2, . . . ,  N\ 

N 

i(^n)  =  Tiy  %  ^  1,2,...,  M\  \ 

71=1 

N 

^  ^  &n£j(Zn)  —  aji  j  =  1)  2,  •  •  •  ,  M2-  (4-1) 

71=1 

Thus,  for  each  D,  the  value  1(D)  =  fDf0(X,u)  dX  =  fi(f0)  ~  /zm1}m2(/o),  is  defined 
uniquely  in  terms  of  the  variables  xm,  m  =  1,2 , . . . ,  M.  So,  we  set  up  a  function,  J,  on 
defined  by  D  E  VM  — ♦  1(D)  =  fJ,Ml,M2(fo)  where  pMuM2(fo)  =  En=i  Oinf0(Zn). 
Clearly  J  can  be  regarded  as  a  vector  function: 

J  :  (xi,X2,  -  -  •  ,xm)  €  1RM  — *  MMi,m2(/o)  €  iR.  (4.2) 

Since  J  is  a  real-valued  function  which  is  bounded  below,  and  is  defined  on  a  compact 
set  (since  constraints  are  to  be  put  in  the  variables),  it  is  possible  to  find  a  sequence  of 
points  so  that  the  value  of  the  function  along  the  sequence  tends  to  the  (finite)  infimum 
of  the  function.  The  coordinate  values  corresponding  to  the  points  in  the  sequence  are  of 
course  finite.  Now,  suppose  that  (a;J,  x\, . . . ,  x*m)  is  the  minimizer  of  the  vector  function 
J;  it  can  be  identified  by  using  one  of  the  related  minimization  methods.  The  introduced 
domain  by  the  minimizer  is  denoted  by  D* .  We  assume  in  the  following  theoretical  result 
that  the  minimization  algorithm  which  is  used,  is  perfect;  that  is,  it  comes  out  with  the 
global  minimum  of  J  in  its  (compact)  domain. 


Theorem  4.1  :  Let  M,  M\  and  M2  be  the  given  positive  integers  which  were  defined  in 
section  3,  and  D*  be  the  minimizer  of  (4-2)  as  mentioned  above.  Then  D*  is  the  min¬ 
imizer  domain  of  the  functional  I  over  T>m  and  the  value  o/I(D*)  can  be  approximated 
by  J(D*);  moreover  J(D*)  — ■»  I(D*)  as  Mi  and  M2  tend  to  infinity. 

Proof:  Suppose  D*  is  not  the  minimizer  of  I;  hence  there  exists  a  domain,  call  D', 
in  T>m  so  that  I(D')  <  I(D*).  Proposition  2.3  shows  that  there  is  a  unique  measure, 
call  /z',  in  A/l+(f2)  so  that  I(D')  =  /z'(/0),  and  also  Proposition  3.1  states  that  for 
sufficiently  large  numbers  Mi  and  M2,  (i'{f0)  can  be  approximated  by  p!Mi  M  (/©)  in 
Q(Mi,M2).  Thus,  I(D')  ~  AiMi,M2(/°)  —  J(D').  In  the  same  way,  one  can  show  that 
J(D*)  approximates  I(D*);  so  I(D*)  =  H*MuM2(fo)  =  J(D*).  Hence  J (D')  <  J(D*). 
which  is  contrary  with  the  fact  that  D*  is  the  minimizer  of  J.  Moreover,  from  Proposition 
3.1  it  follows  that  J(D*)  tends  to  I(D*)  as  Mi,M2  — >  00.  □ 
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5  Numerical  example 

We  consider  the  elliptic  equations  (1.1)  with 


f  1  if  (x,  y)  e  D  n  (7, 
\  0  otherwise, 


where  C  is  the  square  [|,  f  ]  x  [|,  |]  (see  Figure  1).  We  also  take  M  =  8,  Mi  =  10,  M2  =  8, 
N  -  740  (the  36  number  of  nodes  are  chosen  so  that  u\dD  =  0)  and  suppose  Yi,  y2, , . . ,  Yg 
are  0.15, 0.25, . . . ,  0.85,  respectively.  By  extra  constraints,  xrn  >  |  ,  m  =  2, 3, . . . ,  7,  the 

3  3 

value  of  t *  for  any  D  €T>m  is  defined  as  7*  =  fifi  y)  dxdy  ,  i  =  1, 2, . . . ,  10.  We 

4  4 

also  assume  that  the  function  u  takes  values  in  U  ==[-1,1],  and  consider  the  polynomials 
Qi(x,  y)  as  lix,y,x2,xy,y‘2,xs,x2y)xy2,y3.  The  function /0  is  chosen  as /0  =  (u-0.1)2. 
This  function  can  be  considered  as  a  distribution  of  heat  in  the  surface  for  the  system 
governed  by  an  elliptic  equations. 

In  minimization,  we  apply  the  Downhill  Simplex  Method  in  Multidimension  by  using 
the  Subroutine  AMOEBA  (see  [4])  and  also  consider  an  upper  bound  for  variables 
(suppose  they  are  not  higher  than  2).  These  conditions  are  applied  by  means  of  the 
penalty  method  (see  [7]).  Hence,  for  the  nonlinear  case  of  the  partial  differential  equations 
(1.1),  we  have  taken  f(x,y,u)  =  0.25u2,  and  used  the  initial  values  as  Xm  =  1.0,  m  = 
1, 2, ...  ,8,  and  the  stopping  tolerance  for  the  program  (variable  ftol  in  the  Subroutine 
AMOEBA)  as  10“7.  We  remind  the  reader  that  the  functions  Ft  and  the  values  of  7*, 
z  =  1,2,...,  10,  have  been  calculated  by  the  package  “ Maple  V.T .  The  results  are:  the 
optimal  value  of  I  =  0.45467920356379,  the  number  of  iterations  =  502,  the  value 
of  the  variables  in  the  final  step  are  X\  =  1.05019,  X2  =  1.08521,  X3  =  0.750001, 
X4  =  0.768701,  X5  =  1.12986,  X6  =  1.13775  ,X7  =  0.97783,  X8  =  1.61566,  which 
represent  the  optimal  domain,  shown  in  the  Figure  2. 
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Fig.  2.  The  initial  and  the  optimal  domain  for  nonlinear  case  of  elliptic  equations 
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Abstract 

Piecewise  linear  maps  over  triangulations  are  used  extensively  in  geometric  modelling  and 
computer  graphics.  This  short  note  surveys  recent  progress  on  the  important  question  of 
when  such  maps  are  one-to-one,  central  to  which  are  convex  combination  maps. 

1  Introduction 

Piecewise  linear  maps  over  triangulations  have  several  applications  in  geometric  model¬ 
ling  and  computer  graphics.  For  example,  Figure  la  shows  a  surface  triangulation  T  of  a 
set  of  points  (xi ,  yi ,  Z{)  sampled  from  some  unknown  surface  in  R3.  A  standard  approach 
to  fitting  a  smooth  parametric  surface  s(u,  v)  to  these  points  is  to  first  parameterize  them, 
i.e.,  compute  planar  points  (w*,  Vi)  corresponding  to  the  data  points  (xi,  y,.  Zj).  Then  us¬ 
ing  some  scattered  data  method,  we  find  a  parametric  surface  $  :  Q  R3,  defined  over 
some  suitable  domain  Q  containing  the  points  (u^Vi),  such  that 

s(uuVi)  «  (x upuZi). 

A  choice  of  parameterization  is  shown  in  Figure  lb  and  a  least  squares  surface  approx¬ 
imation  using  bicubic  B-splines  is  shown  in  Figure  lc. 

Notice  that  the  choice  of  parameter  points  (Ui,Vi)  uniquely  determines  a  piecewise 
linear  map  0  :  Dr  -*  R2,  where  Dr  is  the  union  of  the  triangles  in  T.  In  practice, 
a  necessary  requirement  on  0  to  ensure  adequate  quality  of  the  subsequent  surface  ap¬ 
proximation  s(u,  v)  is  that  0  should  be  injective.  In  Figure  lb  the  mapping  0  was  taken 
to  be  a  so-called  convex  combination  map ,  which,  as  we  will  see  later,  is  guaranteed  to 
be  one-to-one  since  the  boundary  of  T  is  mapped  to  a  rectangle.  Put  another  way,  none 
of  the  triangles  in  Figure  lb  are  ‘folded  over’.  In  fact  further  properties  of  the  map  are 
important,  such  as  linear  precision,  and  this  was  achieved  in  Figure  lb  by  using  the 
so-called  shape-preserving  weights  (the  coefficients  in  the  convex  combinations).  For  a 
discussion  of  that,  see  [3]. 

Another  application  of  piecewise  linear  maps  is  to  image  morphing.  Image  morphing 
can  be  carried  out  by  continuously  transforming  one  planar  triangulation  T°  (whose 
vertices  represent  feature  points  in  the  image)  to  another,  T1.  Here  we  assume  that 
there  is  a  one-to-one  correspondence  between  the  vertices,  edges,  and  triangles  of  T°  and 
T1.  We  can  view  each  intermediate  triangulation  T(t),  0  <  t  <  1,  (where  T(0)  =  T° 
and  T(l)  =  T1)  as  the  image  of  a  piecewise  linear  map  <j>(t)  :  Dx°  DT(t )■  As  with 
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par ameterizat ions,  it  is  again  important  that  </>(t)  is  one-to-one.  Figure  2  shows  a  so- 
called  convex  combination  morph  of  [4]  of  two  given  planar  triangulations:  T°  appears 
on  the  left  and  T1  on  the  right.  The  two  triangulations  in  the  middle  are  T(l/3)  and 
T( 2/3).  This  morph  ensures  that  <j>(t)  is  one-to-one  for  all  t  in  [0, 1]  and  therefore  T(t) 
has  no  ‘folded’  triangles  at  any  time  instant  t. 

Piecewise  linear  maps  also  arise  in:  texture  mapping;  numerical  grid  generation;  and 
in  setting  up  multiresolution  frameworks  (nested  spaces  of  piecewise  linear  functions) 
for  manifold  surface  triangulations  in  computer  graphics. 


Fig.  1.  Spatial  triangulation  (la),  Convex  combination  parameterization  (lb),  Bicubic 
spline  approximation  (lc). 


Fig.  2.  Convex  combination  morph. 

2  Convex  combination  maps 

For  the  sake  of  simplicity  we  will  only  discuss  convex  combination  maps  defined  over 
planar  triangulations  even  though  all  the  results  hold  equally  well  when  the  domain  of  the 
map  is  a  spatial  triangulation  such  as  that  in  Figure  la.  Thus  let  T  —  {Ti, . . . ,  Tm}  be  a 
simply-connected  planar  triangulation,  with  closed  triangles  Ti ,  and  let  Dr  —  U 
as  in  Figure  3.  We  will  call  a  mapping  4>  :  Dq — >  1R2  a  convex  combination  map  if 
it  is  piecewise  linear  over  T  and,  for  every  interior  vertex  v  of  T,  there  exist  weights 
\vw  >  0?  for  w  E  Nv,  such  that 

^  ^  ^VW  =  1) 

weNv 
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and 

<P{v)  =  A vw<t>(w),  (1) 

wE.Nv 

where  Nv  is  the  set  of  neighbours  of  v\  see  Figure  3. 


Fig.  3.  Convex  combination  map. 


In  applications,  the  mapped  boundary  vertices  <j>{v)  are  chosen  first.  Then  the  weights 
Xvw  are  all  specified  according  to  some  chosen  strategy.  Then  finally  the  mapped  interior 
vertices  are  found  by  treating  the  equations  in  (1)  as  a  linear  system. 

Example  2.1  If  an  interior  vertex  v  of  T  has  five  neighbours  iq, . . . ,  i>s,  then  we  might 
set 

4>{y)  =  ^4>{v  x)  +  +  ^4>{v  5). 

Until  recently,  the  only  theory  behind  convex  combination  maps  was  that  of  Tutte  [8]. 
Working  from  a  purely  graph-theoretic  point  of  view,  Tutte  proposed  a  so-called  bary¬ 
centric  mapping  for  constructing  straight  line  drawings  of  3-connected  graphs  (which 
include  triangulations).  A  barycentric  mapping  in  our  context  is  simply  a  convex  com¬ 
bination  map  in  which  all  the  weights  at  each  vertex  are  equal,  i.e.,  Xvw  =  l/dv,  where 
dv  is  the  degree  or  valency  of  the  vertex  v.  Thus  for  v  in  Example  1  we  must  have 

<j)(y)=  ^4>(vi)  +  +  !<£(v 4)  +  5<K«5)- 

Tutte  showed  that  a  valid  straight  line  drawing,  i.e.  one  with  no  edge  crossings,  results 
from  a  barycentric  mapping  if  the  ‘boundary’  of  the  graph,  a  so-called  ‘cycle’,  is  mapped 
to  a  convex  polygon.  However,  as  argued  in  [3],  convex  combination  maps  share  all  those 
properties  of  barycentric  maps  necessary  for  Tutte’s  proof.  Thus  when  interpreted  in  the 
right  way  and  suitably  generalized,  Tutte’s  theorem  can  be  expressed  in  the  following 
way. 

Theorem  2.2  Suppose  <j> :  Dr  1R2  is  a  convex  combination  mapping  which  maps  the 
n  boundary  vertices  of  T  cyclically  into  the  n  vertices  of  some  n-sided  convex  polygon  in 
the  plane.  Then  <j>  is  one-to-one. 

Despite  this  generalization,  however,  there  are  still  two  aspects  of  it  which  need  to 
be  improved  from  the  point  of  view  of  applications  and  future  research. 

The  first  is  that  we  would  like  to  extend  the  theorem  so  that  we  can  allow  some,  and 
indeed  many,  of  the  mapped  boundary  vertices  to  be  collinear.  Indeed  in  the  application 
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to  parameterization  for  surface  fitting,  it  might  be  convenient  to  map  all  the  boundary 
vertices  of  the  given  triangulation  into  the  four  sides  of  a  rectangle,  as  in  Figure  lb. 
This  is  because  tensor-product  splines  surfaces  are  defined  over  rectangular  domains. 
Collinearity  will  also  often  be  desirable  in  morphing,  as  in  Figure  2,  and  in  most  other 
applications.  Thus  a  drawback  of  Theorem  2.2  is  that  it  does  not  allow  collinear  vertices 
in  the  image  boundary. 

The  second  aspect  is  that  we  would  like  to  simplify  the  proof  in  order  to  have  some 
hope  of  establishing  the  injectivity  of  piecewise  linear  maps  in  even  more  general  situ¬ 
ations,  such  as  when  mapping  to  non-convex  regions,  or  when  some  of  the  mapped  ver¬ 
tices  are  constrained,  for  example.  The  fact  that  Tutte’s  proof  relies  on  the  non-existence 
of  the  Kuratowski  subgraphs  K5  and  Ks^  in  a  planar  graph  illustrates  its  complexity. 

It  is  these  two  improvements  that  are  the  focus  of  [5].  The  main  idea  of  [5]  is  the 
observation  that  Theorem  2.2  is  very  similar  to  a  theorem  on  harmonic  maps,  referred 
to  by  Duren  and  Hengartner  [2]  as  the  Rado-Kneser-Choquet  thereom,  which  was  es¬ 
tablished  in  [7,  6,  1].  Recall  that  a  mapping  $  :  D  — >  R2,  with  D  C  1R2  and  (j)  =  (w,  v), 
is  harmonic  if  both  its  components  u(x,y)  and  v(x,y)  satisfy  the  Laplace  equation  in 
A  i.e., 

UXx  “(“  'U'yy  —  0)  ^xx  T  'Vyy  0> 

see  Figure  4. 

Rado-Kneser-Choquet  Theorem.  Suppose  0  :  D  — ►  IR2  is  a  harmonic  mapping 
which  maps  the  boundary  dD  homeomorphically  into  the  boundary  dfl  of  some  convex 
region  O  C  IR2.  Then  (j>  is  one-to-one. 


Fig.  4.  Harmonic  map. 


This  suggested  that  a  proof  of  Theorem  2.2  might  be  based  on  a  proof  of  the  Rado- 
Kneser-Choquet  theorem,  in  particular  the  short  proof  of  Kneser  [6].  Kneser’s  proof 
begins  by  showing  that  <j>  is  locally  one-to-one  in  the  sense  that  the  Jacobian  of  </>, 

Ux  Uy 

VX  Vy 

never  vanishes.  Kneser  establishes  this  by  supposing  that  the  Jacobian  is  zero  at  some 
point  (a?o,  2/o)-  In  that  case  there  must  be  a  straight  line  ax  +  by  +  c  =  0  passing  through 
the  point  </>(zo,  2/o)  such  that  both  partial  derivatives  of  the  function  h(x,  y)  —  au{x ,  y)  + 
bv(x,y)  +  c  are  zero  at  (xo,2/o)-  At  the  same  time,  the  function  h  :  D  — ►  IR  is  zero  at 
(x0,yo)  and  has  just  two  zeros  along  the  boundary  of  D.  Noting  that  h(x,  y)  is  a  harmonic 
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function,  Kneser  then  uses  the  Nodal  Lines  theorem  of  Courant  to  argue  that  there  are  at 
least  four  zero  contours  of  h  emanating  from  (#0,  Vo)  and  due  to  the  maximum  principle 
for  h)  these  four  curves  can  never  self-intersect  nor  intersect  one  another.  Therefore  all 
four  curves  must  reach  the  boundary  of  D  which  is  a  contradiction. 

These  ideas  were  used  in  [5]  to  establish  a  much  simpler  proof  of  Theorem  2.2  than 
that  of  Tutte.  No  graph  theory  is  needed  at  all.  Instead,  the  discrete  maximum  principle 
for  convex  combination  functions  plays  the  role  of  the  maximum  principle  for  harmonic 
functions.  Similar  to  Kneser’s  proof  we  show  first  that  0  is  locally  one-to-one,  except 
that  we  understand  this  to  mean  that  the  restriction  of  <j)  to  any  quadrilateral  in  T  is 
one-to-one,  a  quadrilateral  being  the  union  of  two  triangles  sharing  a  common  edge. 


Vi 


Moreover,  Theorem  2.2  is  generalized  in  [5]  to  allow  collinear  mapped  boundary 
vertices.  We  call  an  edge  [v,w]  of  T  a  dividing  edge  if  both  endpoints  v  and  w  are 
boundary  vertices  yet  the  edge  [v,  w]  itself  is  not  contained  in  the  boundary.  For  example 
in  Figure  5,  the  only  dividing  edge  in  the  triangulation  is  Dividing  edges  play  a 

critical  role  because  they  partition  the  triangulation  into  subtriangulations  %,  in  each 
of  which  every  convex  combination  function  satisfies  a  discrete  maximum  principle  in  its 
strong  form.  The  main  result  of  [5]  was  the  following. 

Theorem  2.3  Suppose  T  is  any  triangulation  and  that  <ft  :  Dq - *  R2  is  a  convex 

combination  mapping  which  maps  ODr  homeomorphically  into  the  boundary  d£l  of  some 
convex  region  fl  C  R2.  Then  <f>  is  one-to-one  if  and  only  if  no  dividing  edge  [u,  w\  of  T 
is  mapped  by  (j)  into  dVL. 

3  Future  research 

Here  is  a  list  of  topics  for  future  research. 

•  A  triangulation  is  a  special  (maximal)  kind  of  planar  graph.  Can  one  extend  The¬ 
orem  2.3  to  other  planar  graphs,  for  example,  rectangular  grids?  This  is  likely 
because  Tutte’s  theory  already  holds  for  all  3-connected  graphs. 

•  In  what  way  can  the  theorem  be  extended  from  bivariate  maps  to  trivariate  ones? 

•  Can  similar  one-to-one  maps  be  guaranteed  when  mapping  closed  surfaces  of  vari¬ 
ous  topology?  For  example,  we  would  like  to  map  a  closed  manifold  triangulation, 
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homeomorphic  to  a  sphere,  into  a  unit  sphere  injectively.  Here  each  triangle  in  the 
triangulation  would  be  mapped  to  a  spherical  triangle  on  the  surface  of  the  sphere. 

•  Can  one  find  sufficient  conditions  for  the  injectivity  of  constrained  maps,  i.e.,  piece- 
wise  linear  maps  in  which  the  image  of  certain  interior  points  is  specified  in  advance? 

•  Can  one  remove  the  requirement  of  having  to  map  the  boundary  to  a  convex  polygon 
and  still  ensure  a  one-to-one  mapping  under  some  weaker  condition? 

•  Can  the  Rado-Kneser-Choquet  thereom  and  Theorem  2.3  be  combined  as  part  of 
a  single  more  general  theorem? 
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Abstract 

A  survey  is  given  of  algorithms  for  passing  a  curve  through  data  points  so  as  to  preserve 
the  shape  of  the  data. 


1  Introduction 

We  consider  the  problem  of  passing  a  curve  through  a  finite  sequence  of  points.  We  want 
the  curve  to  preserve  in  some  sense  the  shape  of  the  data,  i.e.  the  shape  of  the  curve 
gained  by  joining  the  data  by  straight  line  segments  (which  we  call  the  ‘piecewise  linear 
interpolant’).  We  do  not  consider  the  important  problems  of  approximating  the  data 
by  a  curve,  or  of  shape-preserving  interpolation  by  a  surface.  The  short  length  of  the 
paper  forces  it  to  be  selective.  So  we  concentrate  on  actual  algorithms  for  solving  the 
problem  rather  than  related  theory.  Also  we  consider  only  algorithms  where  the  curve 
is  defined  explicitly,  not  implicitly  either  as  the  zero  set  of  a  function  or  as  the  limit  of 
a  subdivision  process  (though  there  are,  to  our  knowledge,  extremely  few  such  implicit 
shape-preserving  schemes). 

In  Section  2,  we  consider  planar  curves  given  by  a  function  y  =  f(x ),  often  rather 
misleadingly  referred  to  as  ‘functional  interpolation’.  There  are  numerous  such  schemes, 
dating  from  1966,  with  most  of  them  prior  to  1990.  Our  treatment  is  therefore  very 
selective.  Section  3  deals  with  parametrically  defined  planar  curves,  for  which  the  schemes 
are  fewer  and  more  recent.  Finally,  in  Section  4,  we  consider  curves  in  three  dimensions, 
often  called  ‘space  curves’.  Here  the  work  is  much  more  limited,  dating  only  from  1997. 

We  note  that  in  shape-preserving  interpolation,  the  map  from  the  data  to  the  function 
describing  the  curve  must  be  non-linear.  In  what  we  call  ‘tension  methods’  the  curve 
can  be  constructed  by  a  linear  scheme  for  any  choice  of  certain  ‘tension  parameters’. 
These  parameters  are  then  varied  so  as  to  ‘pull’  the  curve  towards  the  piecewise  linear 
interpolant  until  the  shape  criteria  are  satisfied.  Though  there  are  a  few  variations  on 
this  theme,  there  is  generally  a  clear  distinction  between  tension  methods  and  other 
schemes,  which  we  shall  term  ‘direct  methods’. 

2  Functional  interpolation 

Given  data 

( xuyi)eR 2,  i  =  0,...,N,  x0  <  xi  <  ■  ■  ■  <  xN,  (2.1) 
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we  consider  a  function  /:  [x0,  x jv]  — ►  R  satisfying 

f(xi)  =  Vi,  i  =  0,..., N.  (2.2) 

For  some  reasons,  perhaps  the  physical  situation  which  /  is  intended  to  model,  we 
may  wish  the  graph  of  /  to  inherit  certain  shape  properties  of  the  data.  We  now  describe 
these  and  other  properties  which  it  may  be  desirable  for  /  to  possess. 

2*1  Desirable  properties 

Monotonicity.  Here  we  require  /  to  be  increasing  (respectively  decreasing)  if  (yi)  is 
increasing  (respectively  decreasing).  More  generally  we  may  require  the  scheme  to  be 
‘co-monotone’,  i.e.  for  i  =  0, . . . ,  JV  —  1,  /  is  increasing  (decreasing)  on  [a^a^+i]  if 
Vi  <  Vi+i  (Vi  >  Vi+ 1)-  Co-monotonicity  has  the  consequence  that  the  local  extrema  of 
/  occur  exactly  at  the  local  extrema  of  (pi).  Moreover  if  pi  =  pi+ i,  then  /  is  constant 
on  [xi,Xi+i\.  These  properties  may  be  too  restrictive  and  a  weaker  alternative  is  what 
we  call  ‘local  monotonicity’:  for  i  =  1,  ...,N  —  2,  /  is  increasing  on  [xi,Xi+\]  if 
Vi-i  <  pi  <  pi+ 1  <  t/i+2  (and  similarly  for  decreasing).  Although  this  is  not  generally 
stated,  it  is  also  desirable  that  for  i  =  0, . . . ,  N  —  1,  /  has  at  most  one  local  extremum 
on  (®i,®i+i). 

Convexity.  Here  we  require  /  to  be  convex  (concave)  if  the  piecewise  linear  interpolant  is 
convex  (concave).  More  generally  we  call  the  scheme  ‘co-convex’  if  for  i  =  1, . . . ,  N  -  2, 
/  is  convex  (concave)  on  [a^a^+i]  if  the  piecewise  linear  interpolant  is  convex  (concave) 
on  [xi-x,Xi+2\-  It  is  also  desirable  in  a  co-convex  scheme  for  /  to  have  at  most  one 
inflection  in  (#*,£*+ 1),  0  <  i  <  N  —  1. 

Smoothness.  By  definition,  the  piecewise  linear  interpolant  is  shape- preserving,  and  so 
the  problem  is  trivial  unless  we  require  /  to  have  greater  smoothness  than  continuity,  i.e. 
Ck  for  k  >  1.  Since  all  the  schemes  use  piecewise  analytic  functions,  the  Ck  condition 
needs  to  be  checked  only  at  a  finite  number  of  ‘knots’,  which  generally  include  the  data 
points.  We  remark  that  smoothness  and  shape-preservation  may  not  be  compatible;  e.g. 
if  for  i  =  0, ...  ,4,  X{  —  i  -  2,  yi  =  \xi\,  and  /  is  convex  on  [aiq,®4],  then  f(x)  =  \x\} 
— 2  <  x  <  2,  and  so  is  not  C1  at  0. 

Approximation  order.  It  is  generally  supposed  that  the  data  arise  as  values  of  some 
unknown  ‘smooth’  function  g ,  i.e.  Pi  —  g(Xi ),  i  =  0, . . . ,  N.  Then  we  can  consider  how 
fast  the  interpolant  /  converges  to  g  as  we  increase  the  density  of  data  values  Xi  in  the 
fixed  interval  [a,  b].  A  scheme  has  approximation  order  0(hm )  if  ||/-<?||  =  0(hm),  where 
h  —  max{^+i  —  Xi  :  i  =  0, . . . ,  N  —  1}  and  the  usual  norm  is  ||F||  =  sup{|F(o;)|  :  a  < 
x  <  b}. 

Locality.  In  a  ‘global’  scheme,  the  value  f(x),  for  any  re,  generally  depends  on  all  the 
data.  In  contrast,  for  a  ‘local’  scheme,  f(x)  depends  on  the  data  values  (xi,pi)  only  for 
Xi  ‘near’  x.  There  may  be  advantages  in  local  schemes,  e.g.  when  data  are  modified  or 
inserted. 

Fairness.  It  is  often  desirable  that  the  curve  is  ‘fair’,  i.e.  pleasing  to  the  eye,  see  Section 

3. 
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Other  desirable  properties  are  invariance  under  scaling  or  reflection  in  x  or  y ,  and 
stability,  i.e.  small  changes  in  the  data  produce  small  changes  in  /.  There  may  also  be 
other  constraints  on  /,  e.g.  /  >  0  when  yi  >  0,  i  =  0, . . . ,  N. 

2.2  Tension  methods 

Many  tension  methods  are  a  modification  of  cubic  spline  interpolation,  which  we  now 
describe.  Given  data  (2.1),  there  is  a  unique  function  /  satisfying  (2.2),  where  /  is  C2, 
is  a  cubic  polynomial  on  [xi,Xi+ i],  i  =  0,  ...,AT  —  1,  and  satisfies  suitable  boundary 
conditions  at  xq  and  x^.  The  function  /  minimises  over  a  suitable  class  of 

functions  and  this  energy  minimisation  property  is  generally  considered  to  give  a  fair 
curve.  Determining  /  requires  solving  a  global,  strictly  diagonally  dominant  tridiagonal 
system  of  linear  equations. 

Since  cubic  spline  interpolation  is  not  shape-preserving,  in  1966  Schweikert  [67]  mod¬ 
ified  the  scheme  by  replacing  cubic  polynomials  on  each  interval  [.T*,a;i+i]  by  solutions 
of 

/(4)-a,/"=0, 

where  A i  >  0.  When  A*  =  0,  /  will  reduce  to  a  cubic,  while  as  A*  — *  oo,  /  approaches 
a  linear  polynomial.  Thus  A*  acts  as  a  tension  parameter  and  by  making  appropriate 
choices  of  A*  large  enough  the  function  will  preserve  monotonicity  and/or  convexity 
globally  or  locally. 

Many  papers  have  been  written  on  Schweikert’s  tension  splines  giving,  for  example, 
ways  of  choosing  the  values  of  the  tension  parameters,  e.g.  [68,57,46,60].  However  the  fact 
that  the  method  uses  exponential  functions  can  be  seen  as  a  drawback.  An  alternative  was 
introduced  by  Nielson  in  1974  [55]  by  adjusting  the  minimisation  property  of  cubic  splines 
to  a  minimisation  problem  involving  also  the  first  derivative.  The  resulting  function, 
called  a  i/-spline,  is  also  cubic  on  each  interval  [#*,  £*+1]  but  only  C1.  However  the  form 
of  the  C 1  continuity  gives  extra  ‘smoothness’  for  parametrically  defined  curves  and  so 
we  discuss  z/-splines  further  in  Section  3.  By  generalising  the  minimisation  problem  still 
further  one  can  gain  a  Cl  piecewise  cubic  interpolant  with  further  parameters  for  gaining 
shape  properties  [22]. 

The  idea  of  using  rational  functions  in  tension  methods  was  introduced  by  Spath  [69], 
also  in  1974,  and  put  in  a  general  setting  of  tension  methods  in  [57].  From  1982-1988, 
Gregory  and/or  Delbourgo  produced  a  series  of  algorithms  using  rational  functions,  e.g. 
[19,36,20,21,18].  We  illustrate  the  ideas  with  an  algorithm  from  [37].  Here  /  is  C2  and 
on  each  interval  [xi,Xi+ 1)  it  has  the  form,  for  some  a,  6,  c,  d , 

.  .  _  a  -f  bt  +  ct2  +  dt3  x  —  Xi 

1  +  Aj£(l  -t)  t  Xi+i  -  Xi’ 

For  A  i  >  —  l,z  =  0,  —  1,/  can  be  determined  as  the  solution  of  a  strictly  diagonally 

dominant  tridiagonal  linear  system  (and  hence  the  scheme  is  global).  When  all  =  0, 
/  reduces  to  the  usual  cubic  spline  interpolant,  while  as  Xi  oo,  /  converges  uniformly 
to  the  linear  interpolant  on  [xj,Xi+i].  In  general  the  approximation  order  is  0(h2)  for 
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data  from  a  C4  function.  In  the  special  case  of  monotone  data,  choosing 

A»  =  IM  +  (/ (Xi)  +  / (zi+i))^±i — — ,  Hi  >  —3,  i  =  0, . . .  ,iV  —  1, 

Vi+l  -  Vi 

ensures  that  /  is  correspondingly  monotone,  and  for  the  choice  pi  —  —2,  /  reduces  to  a 
rational  quadratic  which  gives  optimal  approximation  order  0(h 4).  Similarly  for  convex 
data,  /  is  also  convex  provided  that  each  A i  satisfies  an  inequality  involving  /  (a?^), 
/  (xi+ 1),  and  choosing  A*  appropriately  (which  requires  solving  a  non-linear  equation) 
further  ensures  approximation  order  0(h4). 

There  are  some  more  recent  methods  involving  rationals,  e.g.  [58]. 

The  idea  of  using  variable  degree  to  preserve  shape  was  introduced  by  McAllister, 
Passow  and  Roulier  in  1977  [47,56].  They  produce  monotone,  convex  schemes  of  arbit¬ 
rarily  high  smoothness  by  constructing  a  shape-preserving  piecewise  linear  interpolant  l 
with  one  knot  between  any  two  data  points  (and  no  knots  at  the  data  points)  and  then 
defining  the  final  interpolant  on  each  interval  [$*,  Xi+i]  as  the  Bernstein  polynomial  of  l 
of  some  degree  mi.  The  idea  was  extended  from  1986  by  Costantini  [8-10].  For  k  >  1, 
rrii>  2fc  +  1,  i  =  0, . . . ,  iV  —  1,  he  constructs  a  shape-preserving  piecewise  linear  inter¬ 
polant  l  with  knots  at  Xi  +  k(xi+ 1  -  Xi)/nrii  and  x%+\  —  k(xi+\  —  Xi)/mi,  i  =  0, . . . ,  N—  1. 
The  final  interpolant  /  coincides  on  each  interval  [xi,Xi+ 1]  with  the  Bernstein  polyno¬ 
mial  of  l  of  degree  mi  and  is  hence  Ck  (with  f^(xi)  =  0,  j  =  2, . . . ,  k).  In  [10]  there  is 
a  co-monotone,  co-convex  scheme  in  which  the  degrees  mt  can  either  be  chosen  a  priori 
or  computed  automatically  according  to  the  data. 

The  above  schemes  using  variable  degree  are  not  strictly  tension  schemes  in  our  sense 
but  in  1990,  Kaklis  and  Pandelis  [40]  introduced  a  tension  method  by  using  the  above 
form  for  k  —  1,  i.e.  on  each  interval  [xi,Xi+i]  it  has  the  form: 

f(t)  =  f(xi)(l-t)  + f(xi+i)t  +  Cit(l-t)mi +ditmi(l-t),  t  =  — — — . 

Xi+i  -  Xi 

Here  >  2  is  an  integer  and  for  each  choice  of  mo, . . .  ,mjv- i,  the  numbers  a,  di  are 
chosen  so  that  /  is  C 2,  which  requires  the  solution  of  a  strictly  diagonally  dominant 
tridiagonal  linear  system.  When  all  ra*  =  2,  this  reduces  to  the  usual  cubic  spline  inter¬ 
polant,  while  as  m*  — *  oo,  /  converges  uniformly  to  the  linear  interpolant  on  [^,2^+1] 
with  order  (^(m”1)  (or  0(m~2)  if  m^-i,  m»+i  remain  bounded).  For  further  discussion 
of  variable  degree  shape-preserving  functional  interpolation,  see  [11]. 

Our  final  type  of  tension  method  was  introduced  by  Manni  [50]  in  1996.  The  general 
idea  is  to  define  /  on  [xi,Xi+ 1]  as 

f(x)=pi(qj'1{x)), 

where  qi  are  cubic  polynomials  on  [$*,  Xi+i]  and  qi  is  strictly  increasing  from  [rr*,  Xi+i] 
onto  itself,  so  that  the  inverse  q"1  is  well-defined  on  [xj,#j+i].  For  /  (x{)  =  di,  i  = 
0, . . . ,  N,  we  require 

Piixi)  =  ^idii  —  A  i,  Pi(X{+ 1)  =  pidi- j_i,  qi(Xi+ 1)  ==  Pi, 

for  parameters  A*  >  0,  pi  >  0.  For  A*  =  pi  =  1,  we  have  qi(x)  =  a;  and  /  reduces  to  a 
cubic  on  [a?t,a?i+ 1]»  while  for  —  pi  =  0,  /  becomes  linear  on  [xi,xi+ 1]. 
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In  [50],  the  values  do, . . .  are  assumed  known  (or  estimated  from  the  data  values) 
and  the  scheme  is  local  C 1 ,  gives  necessary  and  sufficient  conditions  for  the  values  of  the 
parameters  A*,  /i*  for, co-monotonicity,  and  has  approximation  order  0(h 2)  when  g  is  C 2 
and  generally  0(h4)  when  g  is  C4. 

Manni  and  co-workers  have  written  a  series  of  papers  using  the  same  idea,  [51,53,54]. 
For  example  in  [45],  the  values  di  are  not  assumed  given  but  are  chosen  to  ensure  that 
the  function  is  C 2,  thus  providing  a  locally  monotone,  co-convex  global  scheme  which 
generalises  usual  cubic  spline  interpolation;  while  in  [52]  two  further  knots  are  inserted 
in  each  interval  [xi,Xi+\]  to  produce  a  (72,  locally  monotone,  co-convex  local  scheme 
which  interpolates  values  of  j  =  1, 2,  i  =  0, . . . ,  N. 

2.3  Direct  methods 

In  1967,  Young  [71]  considered  shape-preserving  interpolation  by  polynomials  and  a  num¬ 
ber  of  papers  have  appeared  since  on  this  topic,  e.g.  [59]  gives  a  constructive  proof  of 
the  existence  of  a  co-monotone  interpolant  with  an  upper  bound  on  the  degree  required. 
However  for  a  practical  algorithm,  using  a  piecewise  polynomial  offers  much  more  flexib¬ 
ility  than  a  single  polynomial.  Numerous  papers  have  been  written  using  such  polynomial 
splines  and  we  mention  briefly  only  a  few. 

By  inserting  extra  knots  between  data  points,  a  convexity  preserving  scheme  with 
C 2  cubics  was  given  by  de  Boor  [4,  p.303],  and  co-monotone,  co-convex  schemes  with 
C 1  quadratics  in  [48,49,66].  C 1  cubic  splines  with  knots  at  the  data  points  are  used  for 
co-monotonicity  in  [25,5,24,70],  (the  last  of  these  using  a  variational  approach),  and  for 
both  co-monotonicity  and  co-convexity  in  [16,17].  We  also  recall  the  methods  using  spline 
functions  of  variable  degree  with  knots  between  the  data  points  to  obtain  interpolants 
with  arbitrarily  high  smoothness  which  were  discussed  under  tension  methods. 

Finally  we  note  that  following  the  paper  [62]  which  was  as  early  as  1973,  Schaback 
[63]  gives  a  C 2  co-monotone,  co-convex  scheme  which  uses  a  cubic  polynomial  on  any 
interval  [xi,Xi+ 1]  where  an  inflection  is  needed,  and  on  other  intervals  employs  a  rational 
function  of  form  quadratic/linear. 

3  Planar  curves 

Given  data 

Ii  €  i?2,  i  =  0, . . . ,  AT, 

we  consider  a  curve  r  :  [a,  b]  — >  R 2  satisfying 

r(ti)  =  Iu  i  =  0, . . . ,  IV,  (3.1) 

for  values  a  =  to  <  h  <  •••  <  tpj  =  b.  For  a  closed  curve  the  situation  is  extended 
periodically  so  that 

Jo,  —  ti,  i  G  Zy  v(t  4“  b  a)  r(t),  t  G  R- 

3.1  Desirable  properties 

Shape.  For  this  case  it  is  not  usually  relevant  to  consider  preservation  of  monotonicity. 
We  say  a  scheme  is  ‘co-convex’  if  the  curve  r  has  the  minimum  number  of  inflections 
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consistent  with  the  data.  In  practice,  schemes  satisfy  the  somewhat  stronger  condition 
that  for  any  0  <  i  <  j  —  2  <  N  —  2,  r  is  positively  (negatively)  locally  convex  on 
[ti+i,  tj-i)  if  the  polygonal  arc  joining  7*, . . . ,  Ij  is  positively  (negatively)  locally  convex. 
For  more  details  on  this  and  other  desirable  properties,  see  [29]. 

Smoothness.  We  shall  call  the  interpolating  curve  Ck  for  k  >  0  if  the  function  r  is 
Ck.  A  C°  curve  r  we  shall  call  G 1  if  the  unit  tangent  vector  is  continuous,  and  G 2  if,  in 
addition,  the  curvature  is  continuous.  A  Ck  curve  r  is  Gk ,  k  =  1,2,  provided  that  the 
parameterisation  is  regular,  i.e.  r  (t)  ^  (0,0),  which  is  generally  desirable.  It  is  usually 
sufficient  to  have  Gfc,  rather  than  Ck ,  continuity  if  only  the  appearance  of  the  curve  is 
important  and  the  choice  of  parameter  t  is  not  significant. 

Fairness.  Planar  curves  often  arise  in  computer-aided  design  where  it  may  be  par¬ 
ticularly  important  that  the  curve  is  pleasing  to  the  eye.  Though  this  is  subjective, 
various  criteria  have  been  suggested  to  be  relevant,  such  as  magnitude,  rate  of  change 
or  monotonicity  of  the  curvature.  Some  schemes  include  ‘shape  parameters’  which  can 
be  manipulated  by  the  designer  to  modify  the  shape  of  the  curve. 

Approximation  order  is  not  important  in  the  context  of  design  when  the  data  are 
not  considered  to  be  taken  from  some  unknown  curve.  Approximation  order  is  related 
to  reproduction  of  polynomial  curves,  and  a  related  property  for  planar  curves  is  re¬ 
production  of  arcs  of  circles  (or  more  generally  conics);  this  cannot  be  done  exactly  by 
polynomials  but  it  can  be  achieved  by  using  rationals. 

Locality  and  other  desirable  properties  are  similar  to  the  functional  case  as  described 
in  Section  2.1,  though  it  is  generally  more  appropriate  that  the  invariance  is  under  a 
rotation  and  the  same  scaling  in  both  x  and  y. 

3.2  Tension  methods 

In  Section  2.2  we  mentioned  Nielson’s  i/-splines  [55].  Applying  this  scheme  for  both 
components  of  r  gives  a  function  r  which  is  cubic  on  each  interval  [t»,t<+i],  is  C 1  and 
satisfies 

r"(tt)=r"(ti)  +  Uir'(ti),  i  = 

where  Vi  >  0.  This  condition  is  sufficient  for  G2  continuity  of  r  (assuming  regular  para¬ 
meterisation).  When  all  v{  =  0,  r  will  reduce  to  the  usual  C 2  cubic  spline  interpolant.  As 
Vi  — ►  oo,  the  curve  is  ‘pulled  tight’  at  Ii  and  as  z^,  Vi+i  — >  oo,  it  approaches  the  linear 
interpolant  on  [ti,ii+i]. 

The  scheme  in  [37]  by  Gregory  which  was  mentioned  in  Section  2.2  was  adapted  to 
the  planar  case  in  [38].  Other  schemes  using  rationals  were  proposed  by  Clements  in 
[6,7],  where  r  is  a  C2  curve  which  on  each  interval  [ti,U+ 1]  has  the  form,  for  some  a,  fr, 
c,deR2, 

,  .  a(l  -  s)3  -  ,,  .  dss  t-U 

r{t)  =  — — — —  +  b(l  -  s)  +  CS  + — - 7— r,  s  =  ~ - ”, 

Wi8  +  1  tUi(l~s)  +  l  U+i-ti 

where  Wi>  0  are  the  tension  parameters. 
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The  variable  degree  tension  method  of  [40],  also  mentioned  in  Section  2.2,  was  adapted 
to  the  planar  case  in  [41],  and  extended  in  [27]  to  allow  the  designer  to  obtain  a  ‘fair’ 
curve  by  minimising  the  number  of  changes  in  the  monotonicity  of  the  curvature. 

3.3  Direct  methods 

The  papers  [34,35,28,23]  give  local,  G 2  co-convex  schemes,  e.g.  in  [28],  a  rational  cu¬ 
bic/cubic  is  used  on  each  interval  [U,ti+ 1]  and  the  tangent  vectors  and  curvatures  are 
stipulated  by  the  algorithm  to  ensure  that  the  convexity  conditions  are  satisfied  and 
circular  arcs  are  reproduced,  with  the  possibility  of  modifying  the  tangent  vectors  and 
curvatures  further  as  shape  parameters. 

Following  an  earlier  scheme  in  [64],  Schaback  in  [65]  gives  a  global  G 2  co-convex 
scheme  which  uses  a  cubic  polynomial  on  any  interval  [£*,£*:+ 1]  where  an  inflection  is 
needed,  and  on  other  intervals  employs  quadratic  polynomials. 

Sapidis  and  Kaklis  [61]  give  a  G2  co-convex  scheme  by  interpolating  by  a  piecewise 
quintic  curve  tangent  directions  and  curvatures  gained  by  their  tension  method  [41]. 

In  [1]  a  local,  co-convex  G2  scheme  is  given  which  uses  polynomials  of  degree  six 
and  which  attempts  to  obtain  a  fair  curve  by  imposing  conditions  on  the  curvature  to 
minimise  measures  of  fairness.  Finally  we  note  that  in  [12]  Costantini  gives  an  abstract 
theory  and  general  purpose  code. 

4  Space  curves 

Given  data 

UeR3,  i  =  0,...,N, 

we  consider  a  curve  r  :  [a,b\  —>  R3  satisfying  condition  (3.1)  as  before. 

4.1  Desirable  properties 

What  is  meant  by  ‘shape-preserving’  is  not  so  clear  for  space  curves  as  for  the  planar 
case.  Criteria  were  introduced  by  Kaklis  and  Karavelas  [39]  and  extended  by  Ong  and 
the  author  in  [31].  We  shall  sketch  these  below.  They  are  discussed  in  further  detail  in 
[30],  where  some  further  extensions  are  suggested.  We  write,  for  appropriate  indices  i : 

Li  —  I{. fi  —  /j,  A i  =  det [Tj_i ,  Z/f,  Z/i-(-i],  Ni  =  Li— i  x  L\. 

Torsion.  We  ensure  that  the  curve  is  ‘twisting’  in  the  same  manner  as  the  piecewise 
linear  interpolant  by  requiring  that  if  A;  ^0,  then  the  torsion  of  r  has  the  same  sign  as 
A i  on  (tj,  tj-f i ). 

Convexity.  Let 

K(t)  =  r  (t)  x  r  (<),  a  <  t  <  b. 

We  require  that  for  1  <  i  <  N  —  l,  K(U).Ni  >  0,  which  means  that  the  projection  of 
the  curve  r  onto  the  plane  of  J*_i,  h,  h+ i?  has  the  same  sign  of  local  convexity  at  J*  as 
the  polygonal  arc  Ii-ilili+i.  Moreover  if  A^.A^+i  >  0,  we  require 

K(t).Nj  >  0,  j  =  i,i  +  1,  U<t<ti+ 1, 
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which  implies  that  the  curve  r  has  the  same  sign  of  local  convexity  on  [£*,£*+ 1]  when 
projected  in  any  direction  A N{  +  (1  -  X)Ni+1  for  0  <  A  <  1.  Finally  we  require  that 
if  Ni.Ni+i  <  0,  then  for  j  =  i,  i  +  1,  K(t).Nj  has  exactly  one  sign  change  in  [U,ti+ 1], 
which  imples  that  each  of  the  above  projections  of  r  have  just  one  inflection. 

Smoothness.  This  is  as  for  planar  curves,  except  that  we  call  the  curve  G3  if  it  is  G2 
and,  in  addition,  the  torsion  is  continuous.  Other  desirable  properties  are  similar  to  the 
planar  case. 

4.2  Tension  methods 

Although  interpolation  by  space  curves  with  a  special  shape  is  considered  in  [44],  the 
first  specific  shape-preserving  interpolation  scheme  by  space  curves  was  due  to  Kaklis 
and  Karavelas  [39],  who  adapted  the  variable  degree  tension  method  of  [40]  to  give  a 
G2  method  which  was  also  G3,  but  at  the  expense  of  zero  torsion  at  the  data  points.  In 
[42]  the  same  authors  adapted  Nielson’s  i/-splines  to  the  three  dimensional  case  to  give  a 
curve  which  is  C1  and  G2.  The  paper  [14]  also  uses  variable  degree  for  tension  parameters 
but  gives  a  G3  scheme  in  which  the  limiting  curve  as  the  tension  goes  to  infinity  is  not 
the  piecewise  linear  interpolant  but  the  shape-preserving  interpolant  given  by  either  of 
the  above  two  schemes.  In  [15]  a  G3  scheme  is  also  given  but  here  the  components  of  r 
on  each  interval  \tuU+ 1]  lie  in  the  linear  span  of  the  functions 

^  t  —  t- 

(1  -  u),u,  (1  -  u)m\  (1  -  u)mi_1u,  (1  -  u)uvni+1“1,  umi+\  u  =  - - —. 

H+ 1  H 

When  mi  =  m*+i  =  5,  this  reduces  to  a  quintic  polynomial.  As  ra*,  mi+i  — ►  oo,  it  tends 
to  a  linear  polynomial  and  then  the  curve  r  approaches  the  piecewise  linear  interpolant 
on[ti,t<+i]. 

The  paper  [26]  also  uses  variable  degree  splines  with  degree  on  each  interval  at  least 
five,  and  the  curve  r  also  converges  to  the  piecewise  linear  interpolant  as  the  degrees  go 
to  infinity.  However  here  the  curve  is  C4,  which  the  authors  feel  may  give  extra  fairness 
to  the  curve  due,  for  example,  to  lowering  the  maximum  absolute  value  of  the  curvature. 
Variable  degree  polynomial  splines  are  also  used  in  [13]. 

4.3  Direct  methods 

Following  an  earlier  scheme  in  [31],  Ong  and  the  author  gave  a  local  G2  scheme  in  [32] 
which  employed  a  rational  cubic/cubic  between  data  points,  extending  the  ideas  of  the 
planar  scheme  in  [28].  This  was  further  extended  to  a  local  G3  scheme  using  a  rational 
quartic/quartic  in  [43].  In  [33],  the  degrees  of  freedom- inherent  in  the  scheme  in  [32] 
were  used  to  optimise  a  fairness  measure.  Finally  we  mention  the  papers  [2,3]  which  give 
local  G3  schemes  using  a  piecewise  polynomial  of  degree  six,  also  allowing  optimisation 
of  a  fairness  measure. 

It  will  be  noted  that  many  of  the  above  papers  are  extremely  recent  and  it  is  hoped 
that  the  unavoidable  lack  of  detail  here  will  serve  to  tantalise  readers  to  discover  for 
themselves  more  of  this  rapidly  developing  field. 


T .  N.  T.  Goodman 
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Abstract 

The  paper  outlines  procedures  for  extending  the  de  Casteljau,  de  Boor  and  Aitken  al¬ 
gorithms  in  such  a  way  as  to  allow  the  construction  on  a  Riemannian  manifold  of  curves 
analogous  to  Bezier,  B-spline,  and  Lagrange  curves.  These  curves  lie  in  the  manifold  and 
respect  intrinsic  geometry. 

1  Introduction 

Given  a  sequence  of  points  in  a  Riemannian  manifold  M  we  describe  methods  for  extend¬ 
ing  the  de  Casteljau,  de  Boor,  and  Aitken  algorithms.  These  methods  allow  construction 
of  corresponding  interpolating  or  approximating  curves  that  lie  in  the  manifold  and  re¬ 
spect  intrinsic  geometry.  In  the  case  that  the  manifold  is  a  sphere,  opportunity  for 
applications  exist  in  the  domain  of  geological  and  geographical  mapping,  for  instance 
the  creation  of  topographical  contour  lines  or  isotherms,  and  in  the  field  of  video  pro¬ 
duction,  where  it  is  desirable  to  have  smooth  camera  trajectories  interpolating  fixed 
camera  positions.  For  higher  dimensional  manifolds  there  are  applications  in  the  field 
of  data  analysis.  For  the  case  of  a  sphere,  there  is  an  extensive  literature  dealing  with 
the  general  problem  of  data  fitting,  and  a  superb  review  can  be  found  in  Fasshauer  and 
Schumaker  [2].  Shoemake  [7]  uses  properties  of  quaternion  arithmetic  to  describe  curves 
on  the  unit  quaternion  sphere,  and  Levesley  and  Ragozin  [4],  using  techniques  differ¬ 
ent  from  those  presented  in  this  paper,  describe  methods  for  Lagrange  interpolation  in 
differentiable  manifolds. 

The  techniques  described  in  this  paper  come  from  the  simple  observation  that  in  the 
de  Casteljau,  de  Boor,  and  Aitken  algorithms  one  may  formally  substitute  appropriately 
parametrized  geodesic  arcs  for  straight  line  segments.  These  ideas  are  introduced  in  detail 
in  the  next  section  in  the  context  of  the  blossoming  paradigm,  [6]  and  [3].  Unfortunately 
many  of  the  useful  properties  of  blossoms  depend  on  the  affine  structure  of  Euclidean 
space  which  in  general  has  no  counter  part  in  a  Riemannian  manifold.  In  particular, 
geodesic  blossoms  may  be  neither  symmetric  or  multi-affine,  and  in  general  they  do  not 
possess  uniqueness  characteristics  common  to  the  Euclidean  blossom. 

For  an  arbitrary  Riemannian  manifold  [1]  or  indeed  an  arbitrary  differentiable  2- 
manifold  embedded  in  R3,  it  may  not  be  possible  to  construct  unique  shortest  geodesic 
arcs  between  two  points.  However,  if  the  manifold  is  compact  or  in  the  case  that  the  two 
points  lie  in  a  sufficiently  small  neighborhood,  such  arcs  are  known  to  exist.  But  even 
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then,  there  appears  to  be  no  general  method  that  allows  explicit  construction.  So,  the 
task  of  constructing  geodesic  blossoms  becomes  a  study  of  special  cases  in  which  specific 
methods  can  be  set  forth.  For  the  general  case,  a  discrete  variational  method  can  be 
used  to  obtain  good  approximations. 

In  Section  3  a  few  specific  examples  are  discussed.  The  case  in  which  the  manifold 
is  a  sphere  is  given  special  attention.  There  we  introduce  a  variation  which  allows  the 
discussion  of  Archimedian  curves  which  are  constructed  by  substituting  Archimedian 
spirals  for  geodesics.  This  variation  allows  the  natural  construction  of  curves  that  lie  off 
the  sphere.  Although  the  spherical  geodesic  blossoms  are  neither  symmetric  or  multi- 
affine,  a  simple  reparametrization  of  geodesic  arcs  results  in  spherical  blossoms  that  have 
all  desirable  characteristics.  Section  3  also  contains  a  brief  discussion  of  the  problem  of 
finding  geodesics  in  developable  surfaces  and  in  surfaces  of  revolution. 

2  Preliminaries 

Let  M  be  a  C°°  Riemannian  manifold.  There  is  the  following  theorem  that  guarantees 
the  existence  locally  of  geodesics. 

Theorem  2.1  If  M  is  a  Riemannian  manifold,  Xo  €  M.  Then  there  exists  a  neigh¬ 
borhood  V  of  Xo  and  e  >  0  so  that  if  x  €  V  and  v  is  a  non-zero  tangent  vector  at  x 
and  ||i>x||  <  e,  then  there  is  a  unique  CP°  geodesic  a  :  (—2, 2)  — »  M  defined  on  the  open 

interval  (—2,2)  such  that  a(0)  =  x  and  =  vx. 

For  compact  Riemannian  manifolds  there  is  the  Hopf-Rinow  theorem  that  tells  us 
that  points  can  be  connected  by  geodesic  arcs. 

Theorem  2.2  (Hopf  and  Rinow)  If  a  connected  Riemannian  manifold  M  is  compact, 
then  any  pair  of  points  x  and  y  may  be  joined  by  a  geodesic  whose  length  corresponds  to 
the  distance  in  the  manifold  from  x  to  y. 

We  also  need  the  notion  of  geodesic  convexity  and  the  result  of  J  .  H.  C.  Whitehead 
that  geodesically  convex  neighborhoods  exist  for  all  x  €  M. 

Definition  2.3  Given  a  subset  X  of  M  and  a  point  xq  £  X,  X  is  star  shaped  with 
respect  to  the  point  Xo,  if  for  every  x  €  X  there  is  a  unique  shortest  geodesic  connecting 
xo  with  x  which  lies  in  X . 

Definition  2.4  A  subset  X  of  M  is  geodesically  convex  if  it  is  star  shaped  with  respect 
to  each  of  its  points. 

Definition  2.5  Given  a  subset  A  of  a  geodesically  convex  set  X  the  geodesic  convex 
hull  of  A  is  the  smallest  convex  set  which  contains  A. 

Theorem  2.6  (J.  H.  C.  Whitehead)  Let  V  be  an  open  subset  of  a  Riemannian  manifold 
M  and  let  x  e  M  ,  then  there  is  a  geodesically  convex  open  neighborhood  U  of  x  such 
that  U  C  V. 

Let  M  be  a  Riemannian  manifold  and  let  A  be  a  geodesically  convex  subset  of  M. 
Given  points  Pi  in  M  we  describe  extensions  of  the  de  Castlejau,  de  Boor,  and  Aitken 
algorithms. 
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2.1  Riemannian  Lagrange  curves 

Let  M  be  a  Riemannian  manifold,  and  let  A  =  {P0,  Pi,  *  *  •»  P«}  be  a  subset  of  a 
geodesically  convex  subset  X.  Given  parameter  points,  to  <ti  <  •  •  •  <  tn,  assume  that 
A  is  contained  in  a  sufficiently  small  neighborhood  in  which  specified  geodesics  exist. 
For  0  <  i  <  n  —  1,  define  jj  :  [to,  tn\  — >  X  to  be  the  unique  geodesic  parametrized 
so  that  7 }(ti)  =  Pi  and  yl(ti+i)  =  Pi+\.  For  1  <  r  <  n  and  0  <  i  <  n  —  r  define 
7 [  :  [t0,  tn]r  — >  X  so  that  7[(ui,u2,  •  *  *,  wr-i,  • )  is  the  unique  geodesic  parametrized  so 
that  t?:)  =  7p1(wi,w2,‘  *  and  7[(ui,«2,-  *  *,«r- l,  *i+r)  = 

7i+i(wi> ^2,  *  *  •jiV-i).  The  function  7J  :  [to,  tn]n  — +  X  is  called  the  geodesic  Aitken 
blossom  associated  with  the  points  Pi  e  X,  0  <  i  <  n  and  the  parameter  points,  to  < 
ti  <•'••<  tn  .  If  A  :  [to,  tn\  [to,  tn]n  is  the  diagonal  map  defined  by  A(u)  = 
(u,u,  •  •  *, u)y  the  geodesic  Lagrange  curve  associated  with  X  and  the  points  Pi  is  the 

V - - - / 

function  r£  =  7J1  o  A. 


Theorem  2.7  If  Tq  :  [to,  tn\  — >  M  is  the  geodesic  Lagrange  curve  associated  with  the 
points  Pi  €  M,  0  <  i  <  n,  as  defined  above ,  then  rg(tj)  =  Pi. 

Proof:  Observe  that  for  1  <  r  <  n  and  0  <  i  <  n  —  r,  7^  depends  for  its  definition 
only  on  the  points,  Pj,  where  i  <  j  <  i  -f  r.  If  n  =  1,  and  we  are  given  points,  Po  and 
Pi ,  the  result  follows  from  the  definition  of  70  .  Inductively  assume  it  is  true  for  k  <  n. 
For  k  =  n,  if  i  =  0,  by  definition 

rg(M  =  7p(t<htor  •  -,to)  =  7o~1(foi^o>  •  •  -A0)  =■■■=  7 l(t0)  =  P„ 

n  n— 1 


and  likewise  if  i  -  n,  rj(<n)  =  7o  {tn,tn,  •  •  -,tn)  =  7o  -,*n) 


7o(<r.)  = 


n  n— 1 

Pn.  For  i  /  0  and  i  ^  n,  observe  that  the  geodesics  used  in  the  construction  of 
7o_1  and  7f~1  may  be  restricted  respectively  to  the  intervals  [£0,^n-i]  and  \t\,tn)  so 
that  7o_1  becomes  the  geodesic  Aitken  blossom  associated  with  the  points  Po,Pj,*  *  •, 
Pn_iand  the  parameter  points  to  <  t\  <  •  ■  *  <  tn- 1,  and  7” becomes  geodesic  Aitken 
blossom  associated  with  the  points  Pi,P2,  *  •  •,  Pn  and  the  parameter  points  t \  <  £2  < 

•  •  •  <  tn.  By  the  deductive  assumption,  7o  =  P»  =  7i  ”1(^,^,*  •  *,£*)> 


and  consequently  7q  (U^U,  •  *  •)  is  the  geodesic  connecting  1(ti,ti,  •  *  -,£*)  with 


n— 1 


n— 1 


i"  1{tuU ,•••  ,  ^),  and  is  thus  the  constant  function,  7o  ‘  =  Pi  for  all 

□ 


n—l 


u  €  [to,tn].  Thus  in  particular,  7# {U,U,  •  ♦  •,£*)  =  Tg(^)  =  P*. 


V 
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2.2  Riemannian  Bezier  curves 

Following  the  previous  format  we  introduce  a  Riemannian  version  of  the  de  Casteljau 
algorithm.  Accordingly,  let  A  be  a  geodesically  convex  subset  of  a  Riemannian  manifold 
M.  Let  A  =  {P0,  Pi,  •  •  *,  Pn}  be  a  subset  of  X.  Define  7?  :  [0,  1]  — ►  X  by  7 f  (u)  = 
Pi.  For  1  <  r  <  n  and  0  <  i  <  n  -  r  define  7[  :  [0,  l]r  -4  X  to  be  the  unique 
geodesic  with  the  property  that  7[(iti,u2,*  •  *,wr- 1,  0)  =  7 [_1(ui,W2,*  *  *, wr-i)  and 
72r («i>  w2,  ■  *  *>  wr_i,  1 )  =  7r+i1K,W2,  •  •  •,  wr_i).  The  function  7J  :  [0,  l]n  — >  X  is  called 
the  geodesic  de  Casteljau  blossom  associated  with  the  set  A.  If  A  :  [0,  1]  — >  [0,  l]n  is 
the  diagonal  map,  the  geodesic  Bezier  curve  associated  with  X  and  the  set  A  is  the 
function  To  =  7q  o  A. 

2.3  Riemannian  B-Spline  curves 

Given  A  =  {P0,  Pi,  •  •  •,  Pn}  contained  in  a  geodesically  convex  subset  X  of  a  Rieman¬ 
nian  manifold  M,  and  given  knots  ti  <  t2  <  •  •  ■  <  fen,  define  7?  :  [ii,t2n]  — ►  X  by 
7 i(t)  —  Pi,  for  0  <  i  <  n.  For  1  <  r  <  n  and  r  <i  <n,  define  7J  :  [t^  ti+n+ i_r]r  — *  X 
to  be  the  unique  geodesic  with  the  property  that  (tii ,  w2 ,  •  •  • ,  ur _  1 ,  U  )  =  7^  (wi ,  tt2 ,  ♦  • 
',ur- 1)  and  7[(wi,u2,-  •  -,tv- 1,  ^+n+i-r)  =  7[_1(wi,tt2,  •  •  The  function 

7n  :  [^n>  ^n+i]n  — ►  X  is  called  the  geodesic  de  Boor  blossom  associated  the  set  A. 
If  A  :  [tn,  tn+i]  —>  [tn,  i]n  is  the  diagonal  map,  the  geodesic  B-Spline  curve  associ¬ 
ated  with  X  and  the  points  P*  is  the  function  F™  =  7JJ  o  A. 

We  have  the  following  results,  which  follow  from  the  fact  that  both  the  geodesic  de 
Casteljau  and  the  geodesic  de  Boor  blossoms  are  constructed  from  successive  geodesic 
combinations  beginning  with  the  set  A  =  {P0,  Pi,  •  •  Pn}. 

Theorem  2.8  Given  A  =  {Pq,Pi,  •  •  •,  Pn}  contained  in  a  geodesically  convex  subset 
of  a  Riemannian  manifold,  if  7#  :  [0,  l]n  — >  X  is  the  geodesic  de  Casteljau  blossom 
of  A,  then  7q  ([0,  l]n)  is  contained  in  the  geodesic  convex  hull  of  the  set  A. 

Theorem  2.9  Given  A  =  {Po,  Pi,  •  •  •,  Pn}  contained  in  a  geodesically  convex  subset 
of  a  Riemannian  manifold,  if  7#  :  [tn,  tn+i]n  -4  X  is  the  geodesic  de  Boor  blossom  of 
A  relative  to  a  knot  sequence  ti  <  t2  <  •  ■  •  <  t2n,  then  7J ([tn,  tn+ 1]71)  is  contained  in 
the  geodesic  convex  hull  of  the  set  A. 

Since  each  of  the  three  blossoms  are  constructed  successively  from  C°°  geodesics,  it 
follows  that  the  blossoms  and  their  restrictions  to  the  diagonal  are  also  of  class  C°°. 

Theorem  2.10  The  geodesic  Lagrange,  Bezier,  B-spline  curves  are  of  class  C°°  as  are 
each  of  their  corresponding  blossoms. 

3  Examples 

The  impediments  to  implementation  of  these  ideas  depend  on  the  manifold  in  question. 
In  all  cases  it  is  necessary  that  the  points  Pi  should  lie  in  a  region  in  which  it  is  possible 
to  construct  geodesic  arcs  between  points.  The  problem  then  reduces  to  that  of  finding 
methods  for  such  constructions.  Even  in  cases  for  which  this  is  possible,  there  is  the 
additional  problem  that  many  of  the  desirable  properties  associated  with  B-spline  or 
Bezier  curves  in  R3  may  have  no  direct  analogs.  Many  properties  such  as  the  ability 
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to  subdivide  a  curve  depend  on  the  blossom  being  symmetric  or  multi-affine,  and  for 
the  generalizations  presented  here,  this  is  seldom  true.  For  the  case  of  an  orientable 
2-manifold  embedded  in  R3,  there  are  in  many  cases  good  solutions  to  the  problem  of 
finding  geodesics,  but  different  classes  of  surfaces  lead  to  different  solution.  In  this  section 
we  mention  a  few.  In  the  case  that  the  manifold  M  is  the  2-sphere  S 2  a  preliminary 
version  of  our  results  is  reported  in  [5]. 

3.1  The  sphere 

In  the  case  that  M  =  S2,  a  small  alteration  to  methods  presented  so  far  allows  the 
consideration  of  curves  that  lie  off  the  sphere.  Given  points  Pand  Q  that  lie  off  the 
sphere  consider  radial  projections  to  points  P  and  Q  and  let  7  :  [a,  b]  — ►  S2  be  a 
geodesic  with  the  property  that  7(a)  =  P  and  7 (b)  =  Q.  The  curve  7  :  [a,  6]  R3 
defined  by 


7«=(H-  H-FH  +  jEMIQIl)  -7(0 


is  called  the  Archimedian  spiral  connecting  the  points  P  and  Q.  To  explicitly  describe 
the  curve  7  ,  set  P  —  v\ ,  Q  =  V2  and  for  simplicity  consider  the  parameter  interval  [a,  b] 
to  be  the  unit  interval  [0, 1].  For  <  •,  •  >  the  standard  inner  product  on  R3  set 

V3  =  (<Vi,V2>Vi-  v2)/(\\<v1,v2>vi  -  v2\\) 

so  that  V3  is  orthogonal  to  v\  and  in  the  plane  containing  v\  and  v2.  Letting  0  —<  v\,v2  > 
denote  the  angle  between  v\  and  v2,  the  geodesic  7  connecting  v\  with  v2  is  defined  by 

7  (£)  =  cos(W)vi  +  sin(t0)u3 

(  7^..  sin(t0)  <  vi,V2  >  \  sin (tO) 

=  COS  (to)  +  77 - J7  )  Vi  -  -rr- - - - j7^2. 

The  corresponding  Archimedian  Lagrange,  Bezier  and  B-spline  curves  may  now  be  con¬ 
structed  with  the  general  algorithms  of  Section  2. 

One  of  the  difficulties  that  arise  with  Archimedian  curves  is  that  geodesic  blossoms  are 
not  necessarily  symmetric  or  multi-affine.  It  is  even  not  clear  what  these  concepts  might 
mean  in  a  geodesic  context.  Consequently,  certain  results  that  hold  for  normal  Bezier  or 
B-spline  curves  that  depend  on  these  properties  are  no  longer  valid.  In  particular  analogs 
of  the  subdivision  algorithms  that  allow  one  to  determine  control  points  of  a  portion  of  a 
given  Bezier  or  B-spline  are  not  valid.  However,  it  can  be  shown  that  a  simple  non-linear 
change  in  the  parametrization  of  the  geodesic  arcs,  makes  it  possible  to  recapture  most 
of  what  is  needed. 

Definition  3.1  Given  two  points  A  and  B  on  the  sphere.  Let  C  be  the  smaller  arc  of 
the  spherical  geodesic  joining  A  with  B.  The  barycentric  parametrization  of  C  on  the 
parameter  interval  [a,  b]  is  the  function  a  :  [a,  b]  — >  C  defined  by 

a{t)  =  q(x(t)), 

SC 

where  x(t)  =  +  &=f&B  and  4  :  R3  S2  is  the  radial  projection  q(x)  =  — . 

In  the  following  we  prove  a  spherical  version  of  the  Menelaus  theorem. 
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Theorem  3.2  Given  3  points  Po ,  Pi,  P2  on  S 2  let  7  :  [0, 1]  x  [0, 1]  — *  R3  be  the  geodesic 
de  Casteljau  blossom  in  which  all  geodesic  arcs  are  given  the  barycentric  parametrization. 
Then  7(5,  t)  =  7(t,s). 

Proof:  Observe  that  an  elementary  geometric  argument  tells  us  that: 

7(M)  =  7o(s>t)  =  9  ( C1  —  *)7o (s)  +  ^7i (s) ) 

=  g((l-<)[(l-s)Po  +  «Pi]+*[(l-«)J,i+*ft]) 

and 

l(t,s)  =  7o(M)  =  g((l-s)7oW  +  s7i  (t)) 

=  g((l-  s)[(l  -  t)P0  +  tP1]  +  s[(l  -  t)Px  +  tP2]  ). 

And  the  result  follows  from  the  affine  properties  of  ]R3.  □ 

As  an  immediate  consequence  we  have 

Theorem  3.3  Given  points  Po,  Pi,  *  •  •,  Pn  on  S2,  the  associated  de  Casteljau  blossom, 
in  which  geodesic  arcs  are  given  barycentric  parametrization,  is  symmetric. 

The  conventional  blossoming  description  of  subdivision  can  now  be  employed.  From 
the  blossom  construction  we  can  conclude  that  7J  (0, 0,  *  -  *,  0,  1, 1,  ■  •  •,  1)  =  P{.  In  par- 

i 

ticular,  it  follows  that,  for  0  <  u  <  1,  the  points  Qi  =  7q(0,0,-  *  *,  0,  u,u,y  -,tt) 

i 

describe  a  geodesic  de  Casteljau  blossom  which  is  parametrized  to  the  interval  [0,w] 
and  which,  because  of  the  uniqueness  of  geodesic  arcs,  equals  the  restriction  of  7#  to 
[0,ti]n.  Likewise,  for  the  interval  [■ u ,  1]  the  points  Ri  =  Jq(u,u,  •••,«,  1, 1,  •  •'  ♦,  1)  de- 

S  V  1  y 

i 

termine  a  geodesic  de  Casteljau  blossom  which  is  parametrized  to  the  interval  [w,  1] 
and  which  equals  the  restriction  of  7#  to  [u,  l]n  Therefore,  if  g  :  [0,1]  -4  S 2  is  the 
geodesic  Bezier  curve  determined  by  Po,  Pi,  •  •  •,  Pn  and  if  g  =  7J  o  A,  it  follows  that, 
9  I [o,u]  =  1 1->  •  -,t,  u,u,---,u)  and  g|[i,u]  :  1 1->  7$ {u,u,  ■  ■  -,u,  t,  t,  ■  ■  -,t),  for 

V  V  ^  v-  * 

i  i 

0<U<1. 

More  generally  and  along  the  lines  of  the  proof  above,  we  have  the  following  theorem 
which  allows  all  familiar  properties  of  both  Bezier  and  B-spline  curves  which  have 
descriptions  in  terms  of  their  corresponding  blossoms  to  carry  over  to  the  spherical 


Theorem  3.4  Let  f  :  [0,  l]n  —►  R3  be  the  Euclidean  blossom  generated  by  the  de 
Casteljau  algorithm  using  points  P{  e  S2 , 0  <  i  <  n.  Then  7o  =  q  °  /• 

3.2  Other  surfaces 

We  briefly  discuss  two  examples  in  which  explicit  descriptions  of  geodesics  between  points 
are  possible. 

A  developable  surface  S  [4],  described  as  the  image  of  a  function  f  :  U  — ►  R3  for 
U  an  open  subset  of  R2,  possess  the  characteristic,  among  others,  that  distances  are 
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preserved  by  the  function  /.  Therefore,  a  geodesic  in  the  surface  f(U)  may  be  considered 
as  the  image  of  a  straight  line  in  the  plane.  If  Po,Pi,-  *  *,Pn  are  points  in  5,  let  Qi  = 
/_1(Pi),  0  <  i  <  n.  If  C  C  U  is  the  Lagrange,  Bezier,  or  B-spline  curve  obtained 
from  the  standard  Euclidean  versions  of  the  algorithms,  then  it  follows  that  f(C)  is  the 
corresponding  geodesic  curve  in  S  that  would  have  been  obtained  using  geodesic  versions 
of  the  algorithms  that  we  have  described. 

For  surfaces  of  revolution  the  description  of  geodesics  between  two  points  is  rather 
more  involved.  Let  C  be  a  curve  in  the  yz- plane  described  implicitly  by 

f  /(»)  =  * 

\  x  =  0  ’ 

for  (y,  z)  belonging  to  some  open  set  U  contained  in  the  upper  half  of  the  yz- plane.  The 
surface  5  obtained  by  rotating  C  about  the  z-axis  may  be  expressed  as  ^_1(0)  where 
g  :  R  x  U  — >  R  is  defined  by  g(x,y,z)  —  f(y/x2  +  y2)  —  z  =  0.  In  polar  coordinates 
letting  u  =  y/x2  +  y2,  we  express  S  in  the  form 


x  —  u  cos  0 
y  =  u  sin  0  . 

3  =  f(u) 

Let  P  =  (wx  cos 0i,  ui  sin#!,  f(ui))  and  Q  =  (ti2  cos92,  u2sm02,  /(W2))  be  two  points 
on  S.  Then  it  may  be  shown  that  the  geodesic  connecting  P  with  Q  is  the  function 
a  :  [wi,U2]  S  such  that  a(ti)  =  (ticos0(w),sin0(ii),  /(ti)),  where  for  fixed  Wq, 


o{u)=  rM 

Juq  V  C5 


1  +  (/'(t)): 


dt  +  c!  i 


^  -  e 

and  constants  c  and  d  satisfy  the  following  equations: 

"'2  /l  +  (/'(u))2 


02-01=  [ 
Ju 

c'=0 1-  f 

Ju 


til 


111 


-  u 2 


du 


1 +  (/» ): 

■4m4  -  u2 


du . 


For  complete  details  see  [6]. 


4  Conclusion  and  future  research 

We  have  outlined  a  procedure  by  which  conventional  computer  aided  design  constructions 
may  be  extended  to  arbitrary  Riemannian  manifolds.  In  practice,  there  are  difficulties . 
In  a  given  manifold  points  to  be  interpolated  or  approximated  must  lie  in  a  region  in 
which  it  is  possible  to  construct  necessary  geodesic  arcs.  Supposing  this  the  case,  one 
then  needs  to  find  explicit  descriptions  of  the  geodesics.  And  then  there  is  the  question 
of  the  additional  characteristics  which  the  curves  might  possess.  The  paper  raises  more 
questions  than  it  answers.  In  the  case  of  a  sphere,  good  results  are  obtained,  and  it 
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is  also  possible  to  add  variation  that  allows  consideration  of  curves  off  the  sphere  but 
which  project  radially  to  geodesic  Lagrange,  Bezier,  or  B-spline  curves.  It  is  also  shown, 
in  the  spherical  case,  that  a  change  parametrization  of  geodesics  results  in  blossoms  that 
retain  the  desirable  characteristics  associated  with  Euclidean  blossoms.  For  surfaces  of 
revolution  and  developable  surfaces,  we  know  that  geodesics  can  be  found  between  points 
so  the  geodesic  blossom  constructions  will  always  exist.  It  is  however  unlikely  that  these 
blossoms  will  be  either  symmetric  or  multi-affine;  these  characteristics  depend  on  the 
affine  structure  of  M3.  Thus,  in  the  case  of  a  general  Riemannian  manifold,  although  the 
constructions  may  be  valid,  it  is  not  clear  that  we  will  be  able  to  employ  fundamental 
operations  such  as  subdivision  which  depend  on  the  symmetry  of  the  blossom.  We  have 
outlined  three  different  methods  of  blossom  construction,  one  for  each  of  the  algorithms 
considered.  In  the  Euclidean  case,  we  know  that  there  is  a  unique  symmetric,  multi- 
affine  polynomial  that  restricts  to  a  given  polynomial  on  the  diagonal.  This  may  not  be 
true  in  our  more  general  setting. 

Bibliography 

1.  Conlon,  L.,  Differential  Manifolds,  a  First  Course,  Birkhauser,  Boston,  1993. 

2.  Fasshauer,  G.  E.  and  Schumaker,  L.  L.,  Data  Fitting  on  the  Sphere,  in  Mathematical 
Methods  for  Curves  and  Surfaces  II,  Daehlen,  M.,  Lyche,  T.,  and  Schumaker,  L.  L. 
(eds),  Vanderbilt  University  Press,  Nashville,  1998,  117-166. 

3.  Gallier,  J.,  Curves  and  Surfaces  in  Geometric  Modeling,  Theory  and  Applications, 
Morgan  Kaufman,  San  Francisco,  2000. 

4.  Opera,  J.,  Differential  Geometry  and  its  Applications ,  Prentice  Hall,  Upper  Saddle 
River,  NJ,  1997. 

5.  Levesley,  J.,  and  Ragozin,  D.  L.,  Local  Approximation  on  Manifolds  Using  Radial 
Basis  Functions  and  Polynomials,  in  Curve  and  Surface  Fitting,  Cohen,  A.,  Rabut, 
C.R.,  Schumaker,  L.L.  (eds),  Vanderbilt  University  Press,  Nashville,  2000,  291-301. 

6.  Lin,  A.,  Geodesics  between  points  on  surfaces  of  revolution,  Tech.  Report,  Dept. 
Mathematics,  York  University,  Toronto,  May  2001. 

7.  Ramshaw  L.,  Blossoming:  A  Connect  the  Dots  Approach  to  Splines,  Digital  Systems 
Research  Center,  Report  19,  Palo  Alto,  CA,  1987. 

8.  Shoemake,  K.,  Animating  Rotation  with  Quaternion  Curve,  ACM  Proceedings ,  San 
Francisco,  July  22-26,  9,  1985,  245-254. 

9.  Walker,  M.,  Curves  over  a  Sphere,  preprint,  2000. 


Parametric  shape-preserving  spatial  interpolation  and 

z/— splines 

Carla  Manni 

Department  of  Mathematics,  University  of  Torino,  Italy 
manniOdm.unito.it 


Abstract 

In  this  paper  we  present  a  class  of  C 2  spatial  interpolating  curves  depending  on  a  set 
of  tension  parameters  and  we  illustrate  their  ability  to  reproduce  the  shape  of  the  data. 
The  curves  are  constructed  using  cubic  splines  and  basically  reduce  to  classical  ^-splines 
for  particular  values  of  the  tension  parameters. 

1  Introduction 

Shape-preserving  interpolation  via  functional  as  well  as  parametric  splines  is  a  well 
studied  topic  for  the  planar  case.  On  the  other  hand,  shape-preserving  interpolation  for 
spaces  curves  is  considerably  more  complex  than  for  planar  ones  and  the  related  literature 
is  apparently  limited.  On  this  concern,  a  considerable  part  of  the  available  schemes  only 
ensures  geometric  continuity  of  the  obtained  curve  (see  [1,  8]  and  references  quoted 
therein).  Recently,  C2  and  C3  shape- preserving  interpolating  space  curves  have  been 
obtained  using  polynomial  splines  of  variable  degree,  [2,  3,  6].  However,  working  with 
low(fixed)-degree  polynomial  splines  seems  to  be  a  standard  choice  in  the  CAD/CAM 
community.  This  motivates  the  careful  investigation  of  shape  preserving  properties  of 
cubic  v -splines  recently  carried  out  in  [7]  and  the  present  paper. 

In  this  paper  we  present  a  method  for  constructing  C2  spatial  interpolating  curves 
reproducing  the  shape  of  the  polygonal  line  which  interpolates  the  given  data.  The  curve 
is  constructed  via  the  so  called  “parametric  approach11 ,  [10] ,  using  classical  cubic  splines. 
The  shape  of  the  curve  is  controlled  by  the  amplitude  of  the  tangent  vectors  at  the  data 
sites  which  play  the  role  of  tension  parameters.  It  turns  out  that,  for  particular  values 
of  the  tension  parameters,  the  proposed  scheme  provides  a  new,  geometrically  evident, 
description  of  classical  C 1  -  G 2  cubic  i/-splines,  [11].  Moreover,  the  method  produces  a 
suitable  reparameterization  for  the  above  mentioned  curves  ensuring  C2  continuity.  The 
reparameterization  is  a  cubic  polynomial  involving  the  tension  parameters  (see  (3.3)). 
Thus,  the  evaluation  of  the  curve  for  a  fixed  value  of  the  new  parameter  requires  the 
solution  of  a  cubic  equation. 

The  geometric  meaning  of  the  tension  parameters  coupled  with  the  powerful  “shape¬ 
preserving11  properties  of  the  Bernstein-Bezier  representation  can  be  efficiently  used  to 
construct  an  iterative  algorithm  for  C 2  shape-preserving  interpolation.  The  algorithm 
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converges  in  a  finite  number  of  iterations  and  requires  at  each  iteration  the  solution  of 
a  diagonally  dominant  linear  system. 

The  paper  is  organized  as  follows.  In  Section  2  we  state  the  problem.  In  Section  3  we 
describe  the  construction  of  the  required  interpolant  and  we  illustrate  its  dependence  on 
the  tension  parameters.  The  asymptotic  behavior  and  the  shape-preserving  properties 
of  the  obtained  curve  are  briefly  discussed  in  Section  4.  We  conclude  in  Section  5  with 
a  graphical  example. 

2  The  problem 

In  this  section  we  introduce  the  problem  of  shape-preserving  interpolation  by  curves  in 
1R3.  The  adopted  notion  of  shape-preserving  follows  the  definitions  of  [2]  and  [6].  Let 

Ij  e  K3,  i  =  0,...,N, 

be  the  interpolation  points  with  I*  ^  Ii+i-  Define,  for  all  admissible  indices, 


N,  := 


A,;:  = 


I?'+l 

f  xLj 

l  IILi-i  xLi II  ’ 


if  ||Li_!  x  Lj  ||  >  0, 
elsewhere, 


|Lj_i  Lj  Lj+i| 
|Li_ixLi||  ULiXLi^il 


if  ||Lj_i  X  L, 
elsewhere. 


|L,;  X  L, 


where  |a  b  c|  denotes  the  determinant  of  the  matrix  with  columns  a,  b,  c.  The  vectors 
N*  and  the  scalars  A*  are,  respectively,  the  discrete  binormals  and  the  discrete  torsions 
of  the  data. 

Let  the  parameter  values  cq,  i  =  0, . . . ,  IV,  with  cq  <  <7*4-1  be  given,  and  let 
hi  . —  <7*_|_i  oq,  i  —  0, 1,  ...  5  N  1 

be  the  corresponding  spacings.  We  wish  to  construct  a  curve  Q(s),  s  6  [00 which 
interpolates  the  data,  Q(<r*)  =  I*,  i  —  0, ..  .  ,IV,  such  that  Q  €  (72[<7o,<7jv].  In  addi¬ 
tion,  we  also  require  that  Q(s)  is  shape-preserving,  that  is  it  reproduces  the  convexity 
and  torsion  of  the  polygonal  line  connecting  the  interpolation  points.  More  specifically, 
denoting  with  dashes  derivatives  with  respect  to  the  parameter  s,  we  define 

ts(  \ _ QXf)  X  Q"(g)  Q//  \  /  Q  /  \  mi  Q//7(g)l  -r  \  /  Q 

K(  )  ||Q'(s)||3  ’  f  Q  ^  *  °’  ^  ‘  ||Q'(s)  x  Q"(s)||2  5  f  K( ^  ^  ° 

(2-1) 

as  the  curvature  vector  and  the  torsion  of  the  curve  respectively.  Q(s)  is  shape-preserving 
if  it  satisfies  the  following  criteria  ([2,  6,  7]). 

(i)  Convexity  criteria: 

(1.1)  if  N*  •  N*+i  >  0,  then  K(s)  •  Nj  >  0,  j  =  i,i  +  1,  s  e  [<7*, <7*4.1], 

(1.2)  if  N*  ♦  N*+i  <  0,  then  K(s)  *  Nj,  j  =  i,  i  + 1,  has  one  change  in  sign  in  [o'*,  cr^-i-!] , 

(1.3)  if  N*  •  Nj  7^  0  then  (K(<jj)  •  Nj)(N*  •  Nj)  >  0,  j  =  i  —  1, i, i  +  1. 


(ii)  Torsion  criteria:  if  A*  ^  0  then  r(s)A*  >  0,  s  £  [cr*  ,ai+1}. 
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For  the  sake  of  brevity  we  refer  to  [7]  for  the  more  technical  collinearity  and  coplanar¬ 
ity  criteria. 

3  Constructing  the  interpolating  curve 


In  order  to  construct  the  curve  Q  we  consider,  as  a  first  step,  a  cubic  curve  C  interpol¬ 
ating  the  data.  We  put 


C(t)|[o-,,CTi+ll  :=  C,:(t;  A*  ,  A*' 


(3.1) 


C,(£;  Ajo: >,  AjJ))  :=  + 1 i+1H[°\u)  +  A^T,^1  >(«)  +  X^h,Tl+lH\r>(u), 

te[<Ti,0i+i],  u:=  {t-<Ji)/hi,  (3.2) 

where  0  <  A®,A^  <  1  are  shape  parameters,  T*,  Tj+i  are  vectors  to  be  determined 
and  H^\u)  denote  the  elements  of  the  cardinal  basis  for  cubic  Hermite  interpolation, 
that  is  H\^  ( u )  are  the  polynomials  of  third  degree  such  that 

dlH\3)(r)  ... 

du^  —  r,  l  —  l),  1. 

One  can  immediately  verify  that  the  curve  (3.2)  interpolates  the  points  I*,  I*+i  at  the 
extremes  of  the  interval  [ai,<Ti+ 1]  and  has  tangent  vectors  A-°^T7,  A-^T*+i  at  the  same 
extremes.  The  parameters  a|°\a^  determine  the  amplitude  of  the  tangent  vectors  of 
the  curve  at  the  two  end  points  of  the  interval  and  they  control  the  shape  of  the  curve.  To 
be  more  specific,  since  Hq°\u)  +  h[°\u)  =  1,  we  have  that  C *(£;  0, 0)  reduces  to  the  line 
through  I I;+i.  Thus,  the  parameters  A-°\a^  act  as  tension  parameters  stretching 
the  curve  from  the  classical  Hermite  cubic  interpolating  I*,  I*+i  with  tangents  T?;,  T7+i 
(Aj0),Af}  =  1)  to  the  line  segment  (A-°\a^  =  0).  The  curve  (3.1)  turns  out  to  be  of 
class  G1 . 

Let  us  consider  now  the  new  global  parameter 

:=  ®i(*;  Aj0>,  Aj1*)  :=  +  (3.3) 

X^hiH^i^  +  X^hiH^iu). 

It  is  not  difficult  to  see  that,  if 


then 


0  <  A;°\  A  .1’  <  1 


dSi(i;  Af.AP) 


dt 


>  0,  t  6  [<7i,(7i+i] 


(3.4) 


Thus  (3.3)  implicitly  defines  a  function  t  =  t(s ),  which  provides  a  reparameterization  for 
(3.1).  In  the  following  we  assume  that  conditions  (3.4)  hold  and  we  define 

Q(s)  :=  C(£(s)).  (3.5) 

Since  Q'(<7i)  =  T?;,  i  =  0, . . . ,  TV,  Q  is  of  class  C1.  For  each  sequence  of  the  tension 
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parameters  A^0\A^  we  will  determine  the  tangent  vectors  T^T^i  so  that  Q  is  also 
of  class  C 2.  Let  us  denote  by  dots  derivatives  with  respect  to  the  local  parameter  u. 
Imposing  continuity  of  Q"(s)  at  cq,  i  =  1, . . . ,  N  —  1,  from  (3.3),  (3.5)  and  from  the 
chain  rule  for  derivatives,  we  obtain 

C^_i(l  — i(l  —  gj(0+)/^A^Ty  . 

(hi- 1A&)8  ~  (m!0))  3 

Thus,  after  some  manipulations,  from  (3.2)  we  have 


UiT^i  -f  Ti  +  ViTi+i 


i  =  1, . ,iV  -  1. 


fci-xA&M'V 


hiX^ih^  A«)2 


W  =  hi-i( 3  -  Aj°)1)(M?,))2  +  ft<'(3  -  A-1))(lii_1A-i)1)2 


Zj  =  ^-Lj(fti_iA^i\)2 


(W7 


In  order  to  uniquely  determine  the  vectors  Ti  we  need  two  additional  equations  that 
will  be  obtained  by  imposing  boundary  conditions.  Classical  boundary  conditions  are 
periodic  conditions : 

^oTjv-i  +  Tq  +  U0T1  =  zo,  unTn_i  +  Tjv  +  v^Ti  —  z # 

(with  u0,Uo,ttjv,vjVjZ0,zjv  defined  according  to  (3.8)  setting  /i_  1  —  hjv-i,  A^J  =  A^_i, 

A(_'l  =  A$JL i,  L-i  =  Lw_i,  Aw  =  h0,  =  40),  A«  =  A^,  LN  =  L0)  and  end 

tangent  conditions : 

To  =  D0,  Tat  =  Djv, 

(where  Do,  D^v  are  given  in  input).  In  the  following  we  will  denote  by  X  the  set  of  indices 
{1, . . . ,  N  —  1}  ({0, . . . ,  N})  when  end  tangent  (periodic)  conditions  are  considered.  It  is 
not  difficult  to  see  that  (3.7)  for  any  choice  of  the  above  mentioned  boundary  conditions 
provide  a  diagonally  dominant  system  .  ■ 


AT  =  z. 


Thus  we  can  state  the  following 


Theorem  3.1  For  any  sequence  a|°\  A^\  i  =  0, . . . ,  AT-1,  satisfying  (34 ),  there  exists 
a  unique  Q  e  C2[<jo,  ctat]  defined  via  (3.1)-(3.3),  (3.5)  which  interpolates  the  given  data 
and  satisfies  periodic  or  end  tangent  conditions. 

We  notice  that  for  A^  =  A^  =  1,  system  (3.9)  reduces  to  the  system  for  the 
computation  of  classical  C 2  cubic  splines.  Moreover,  if  =  A^  =  A fc,  k  G  X,  the 
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curve  G  is  of  class  C 1  and  equation  (3.6)  reads 


hj  Si(0*)  ~  fci-i3t-i(l  )  d 
A  i  dt 


C  Ml 


Then  (3.6)  is  equivalent  to  impose  that  the  cubic  curve  (3.1)  is  a  Cl-G 2  cubic  v-  spline 
[5,  7,  11]  where,  from  (3.3),  for  i  e  1 


Vi  := 


h~2Si(0+)  -  h~^Si- i(l-)  (6  -  4A;  -  2Af+i)/i-1  +  (6  -  2A;_,  -  4Ai)h~!1 


(3.10) 


4  Asymptotic  behavior  and  shape-preservation 

In  this  section  we  briefly  discuss  the  asymptotic  behavior  and  the  resulting  shape¬ 
preserving  properties  of  the  curve  Q,  defined  by  (3.1)-(3.3),  (3.5)  and  (3.9),  as  the 
tension  parameters  A,-°\a^  approach  zero.  The  following  lemma  (see  also  [7])  concerns 
the  asymptotic  behavior  of  the  tangents  T*.  We  omit  the  details  of  the  proof  which  are 
completely  analogous  to  those  of  Theorem  3  in  [9] . 

Lemma  4.1  The  vectors  T*,  i  —  0, . . .  ,iV,  obtained  from  (3.9)  are  bounded  independ¬ 
ently  of  A ,xf\  j  =  0, . . . ,  N  -  1.  Moreover, 


lim 


Ma'0)): 


hi- i(A£\) 


Li 


A&,  A<»-*0 


hi( Af0)2  +  ^-i(A«  )2  hi- 1  /ii_i(A;i\)2  +  MA|0))2  hi 


(1  ~  ai)l~- +  a^r '  i€l. 
rii-i  rii 


(4.1) 


Since  the  tangents  are  bounded  independently  on  the  tension  parameters,  from  the 
previous  section  we  have  that  Q  approaches  the  piecewise  linear  function  interpolating 
the  data  as  the  tension  parameters  tend  to  zero.  Moreover,  each  tangent  T*  determined 
by  (3.9)  tends  to  a  strictly  convex  combination  of  and  L  if  hi  as  the  tension 

parameters  A^  tend  to  zero  while  A[l\/Aj°*  remains  bounded  and  strictly  positive. 
Due  to  these  two  main  facts,  we  are  able  to  easy  control  the  shape  of  the  curve  Q  and 
to  ensure  that  it  reproduces  the  shape  of  the  data  as  the  tension  parameters  approach 
zero  as  we  will  discuss  briefly  in  the  following. 

Since  C  and  Q  only  differ  for  a  reparameterization  they  have  the  same  image.  Thus,  as 
far  as  the  shape-preserving  properties  are  concerned,  we  can  consider  the  expression  of  C. 
As  noticed  in  Section  3,  if  A^\  =  A-0),  i  €  1 ,  the  curve  C  with  Tj  obtained  by  (3.9),  is  a 
Cl~G2  cubic  iz-spline.  In  such  a  case,  using  (3.10),  the  careful  shape  analysis  carried  out  in 
[7]  and  the  resulting  algorithm  can  be  considered.  However,  the  simple  geometric  meaning 
of  the  tension  parameters  A-°\  A[^  coupled  with  the  “shape-preserving”  properties  of  the 
Bezier-Bernstein  representation,  allow  us  to  more  easily  establish  the  shape-preserving 
results  also  for  completely  general  configurations  of  a|1\,  A.-°\  Thus,  we  express  the 
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curve  segment  Ci(t;  A  .0\a^)  in  Bezier-Bernstein  form: 

C«(t;  Af ,  A^)  =  - 1)3-', 

Ci,o  :=  I*j  Ci,i  :=  Ij  +  ^hiX^Ti,  Ci}2  •=  Ii+i  —  j^iA^Ti+i,  CZ)3  Ii+i. 
Let  us  consider  at  the  beginning  the  convexity  criteria. 


Lemma  4.2  7/NY  *  Nj  ^  0  and 


c  >  0,  then 


lim  (K(<?j)  •  Nj)(Nj  •  Nj)  >  0. 


Proof:  From  the  properties  of  Bezier  curves  (see  [5])  and  from  (2.1)  and  (3.5) 
sgn(K(<7j)  •  N j)  =  sgn((CM  -  C<;'o)  x  (Ciy2  -  C<  i))  •  Nj 


(1  -  OLi) 


Ni  •  N, 


sgn  ^  T«  x  (hi  -  -  ^^Ti+1  j 

where  sgn(y)  denotes  the  sign  of  y.  Moreover,  from  (4.1) 

lim  (Ti  x  Li)  •  N,  =  (ah  x  L*  +  (1  -oi)^x  lA  -.N,  =  (1~Qi)Ni  •  Nj. 

Af.>,xfUo  ’  3  \  K  hi-!  J  hi- j 

Hence,  we  obtain  the  assertion  if  N*  •  Nj  ±  0.  □ 

The  previous  lemma  ensures  that,  if  A-^.\,  A-1^  are  small  enough  the  third  convexity 
criterion,  (i.3),  stated  in  Section  2  is  satisfied.  In  addition,  the  sign  of  K(<jfc)*Nj,  k  =  i,i+ 
1  can  be  checked  considering  the  Bezier  coefficients  C^,  l  =  0, 1, 2, 3,  of  C*.  Furthermore, 
thanks  to  the  shape-preserving  properties  of  totally  positive  bases,  for  small  values  of  the 
tension  parameters,  (see  [4])  the  number  of  changes  in  sign  of  K(s)  •  Nj,  s  6  [<?ucn+\] 
is  bounded  by  the  number  of  changes  of  sign  in  the  pair  K (<7fc)  •  Nj,  k  =  i,i  -fi  1.  Thus, 
also  the  first  and  the  second  convexity  criteria  (i.l)  and  (i.2)  are  satisfied  if  the  tension 
parameters  are  small  enough. 

As  far  as  the  torsion  is  concerned,  we  recall  that  the  sign  of  the  torsion  of  a  cubic 
curve  coincides  with  the  sign  of  the  discrete  torsion  of  its  Bezier  control  polygon  (see  for 
example  [5])  thus  it  is  not  difficult  to  obtain  the  following 


Lemma  4.3  If  A*  7^  0  and 


c  >  0,  j  =  +  1,  then 


\(°)  AU)  \(0)  A(i)  \<o)  A0) 

>Ai+iAi+i  u 


r(s)Ai  >  0,  s  €  k+,cri+1]. 


With  similar  arguments  it  is  not  difficult  to  prove  that  also  the  collinearity  and  the 
coplanarity  criteria  stated  in  [7]  are  fulfilled  as  the  tension  parameters  approach  zero. 
We  omit  the  details  for  the  sake  of  brevity. 

Summarizing,  from  the  previous  discussion  it  follows  that  if  the  tension  parameters  are 
small  enough  then  the  Bezier  control  polygon  of  C  reproduces  the  shape  of  the  data  and 
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the  curve  C  does  the  same  thanks  to  the  properties  of  Bezier-Bernstein  representation. 
Thus,  to  obtain  an  automatic  algorithm  to  compute  the  C 2  interpolant  Q  defined  by 
(3.5),  satisfying  convexity  and  torsion  criteria,  basically  we  have  to  perform  the  following 
steps: 

(a)  for  a  given  sequence  of  the  tension  parameters  solve  the  system  (3.9)  and  compute 
the  Bezier  coefficients  of  the  resulting  curve  C; 

(b)  check  if  the  control  polygon  of  each  segment  C*  satisfies  the  convexity  and  torsion 
criteria; 

(c)  if  this  is  not  the  case  reduce  the  values  of  the  related  tension  parameters  according 
to  a  given  rule  and  go  to  step  (a). 

5  A  graphical  example 

To  illustrate  the  performance  of  the  presented  scheme  we  consider  the  data  proposed 
in  [7],  Example  2,  consisting  of  20  points  with  uniform  parameterization  in  [0,1].  End 
tangent  boundary  conditions  have  been  used  (see  Table  2  in  [7]).  Figures  1-3  show  the 
behavior  of  the  obtained  C 2  curve  Q  compared  with  the  classical  C 2  cubic  spline.  The 
shape-preserving  curve  Q  is  defined  by  the  following  sequence  of  tension  parameters 

:  .6  .6  1  .9  .9  1  1  1  1  1  1  1  1  1  .75  1  1  1  1 

AP  :  .9  .6  .6  1  .9  .9  1  1  1  1  1  1  1  1  1  1  1  .75  1. 


Fig.  1.  C2  cubic  spline  (left)  and  Q  (right). 
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Abstract 

We  discuss  here  recent  developments  on  the  convergence  of  the  g-Bernstein  polynomials 
Bnf  which  replaces  the  classical  Bernstein  polynomial  with  a  one  parameter  family  of 
polynomials.  In  addition,  the  convergence  of  iterates  and  iterated  Boolean  sum  of  g- 
Bernstein  polynomial  will  be  considered.  Moreover  a  q— difference  operator  Vqf  defined 
by  T>qf  —  f[x,  qx]  is  applied  to  g-Bernstein  polynomials.  This  gives  us  some  results  which 
complement  those  concerning  derivatives  of  Bernstein  polynomials.  It  is  shown  that,  with 
the  parameter  0  <  q  <  1,  if  A kfr  >  0  then  VqBnf  >  0.  If  /  is  monotonic  so  is  VqBnf. 
If  /  is  convex  then  V\Bnf  >  0. 


1  Introduction 

First  we  begin  by  introducing  some  notations  to  be  used.  For  any  fixed  real  number 
q  >  0,  the  g-integer  [ k ]  is  defined  as 

{ki((l-qkW-q),  9/1, 

1  1  \  k,  9=1, 

for  all  positive  integer  k.  The  term  Gaussian  coefficient  is  also  used,  since  they  were  first 
studied  by  Gauss  (see  Andrews  [1]). 

Let  p(N,  M,  n)  denote  the  number  of  partitions  of  a  positive  integer  n  into  at  most  M 
parts,  each  less  than  or  equal  to  N .  Then  the  Gaussian  polynomial,  G(N,  M ,  n),  appears 
as  the  generating  function 

G(N,M,n)=  \N^M]  ='$2p(N,M,n)qn. 

**  n>0 


Note  that  [£]  defined  by 


n  >  k  >  0, 
otherwise, 


where  [n]!  =  [n][n  —  1]  •  •  •  [1]  with  [0]!  =  1,  is  called  Gaussian  polynomial  (or  g-binomial 
coefficient)  since  it  is  a  polynomial  in  g  with  the  degree  (n  —  k)k.  The  g-binomial  coeffi- 
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cients  satisfy  the  recurrence  relations, 


n  +  1 
k 


and 


(1.1) 

(1.2) 


The  following  Euler  identity  can  be  verified  using  the  recurrence  relation  (1.1)  by 
induction  that 

k  r 

(1  +  z)(l  +  qx)  •  •  •  (1  +  qk~1x)  =  ^  qr(.r~1')/2  xr.  (1.3) 

Lr  J 

Phillips  [8]  introduced  a  generalization  of  Bernstein  polynomials  (g-Bernstein  poly¬ 
nomials)  in  terms  of  ^-integers 

n  r  1  n—r— 1 

Bn(f\x)  =  ”  Xr  JJ  ( l-qsx ),  (1.4) 

r=0  L  J  s=0 

where  fr  =  f  (|^J)  and  an  empty  product  denotes  1.  When  q  —  1  the  (1.4)  reduces  the 

classical  Bernstein  polynomials.  The  Bn(f;x)  generalizes  many  properties  of  classical 
Bernstein  polynomials.  Firstly,  generalized  Bernstein  polynomials  satisfy  the  end  point 
interpolation 

=  sn(f]  i)  ==  /(i). 

Phillips  [8]  also  states  the  generalization  of  well  known  forward  difference  form  (see  Davis 
[3])  of  the  classical  Bernstein  polynomials  by  the  following  theorem. 

Theorem  1.1  The  generalized  Bernstein  polynomial,  defined  by  (1.4) >  maV  be  expressed 
in  the  q-difference  form 

71  r 

Bn(J-x)=Y/  U  A70xr  (1.5) 

r=0  Lr- 

where  Ar  fi  =  Ar“1/i+i  -  gr_1Ar_1/i  for  r  >  1  and  A0/*  = 

It  is  easily  verified  by  induction  that  ^-differences  satisfy 

Ar/i=E(-l)V(fc-1)/2[^l/r+i-fc.  (1-6) 

k= 0  L  - 

Using  the  ^-difference  form  of  the  g-Bernstein  polynomials  (1.5),  one  may  show  that 
g-Bernstein  polynomials  reproduce  linear  functions,  since  Bn( l;rc)  =  1;  Bn(x\  x)  ~  x. 


2  Convergence 

In  the  discussion  of  the  uniform  convergence  of  the  g-Bernstein  operator,  the  Bohman- 
Korovkin  Theorem  (see  Cheney  [2])  is  used  as  in  the  classical  case.  The  Bohman- 
Korovkin  Theorem  states  that  for  a  linear  monotone  operator  Cn ,  the  convergence  of 
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£n  f  — >  /  for  f(x)  =  l,x,x2  is  sufficient  for  the  sequence  of  operators  Cn  to  have  the 
uniform  convergence  property  £nf  — ►  /,  V/  G  <7(0, 1].  Observe  that  the  g-Bernstein  op¬ 
erator  is  a  monotone  linear  operator  for  0  <  q  <  1.  For  a  fixed  value  of  q  with  0  <  q  <  1 

\  1 

n  — >  -  as  n  — >  oo. 

1-9 

Notice  that,  since  Bn(x2;x)  =  x2  -f  ,  Bn(x2;x)  does  not  converge  to  x2.  Phillips 
[8]  studies  the  uniform  convergence  of  g-Bernstein  polynomial. 

Theorem  2.1  Let  q  =  qn  satisfy  0  <  qn  <  1  and  let  qn  — >  1  as  n  — +  oo.  Then , 

Bn(f;x)^f(x)y  V/(*)€C[  0,1]. 

The  degree  of  g-Bernstein  approximation  to  a  bounded  function  on  [0, 1]  may  be  de¬ 
scribed  in  terms  of  the  modulus  of  continuity  with  the  following  theorem. 

Theorem  2.2  If  f  is  bounded  on  [0, 1]  and  Bnf  denotes  the  generalized  Bernstein 
operator  associated  with  f  defined  by  (1.4),  then 

An  error  estimate  for  the  convergence  of  g-Bernstein  polynomials  is  given  in  Phillips  [8] 
by  the  Voronvskaya  type  theorem. 

Theorem  2.3  Let  f  be  bounded  on  [0, 1]  and  let  Xo  be  a  point  of  [0, 1]  at  which  /"(,To) 
exists .  Further,  let  q  =  qn  satisfy  0  <  qn  <  1  and  let  qn  1  as  n  — >  oo.  Then  the  rate 
of  convergence  of  the  sequence  of  generalized  Bernstein  polynomials  is  governed  by 

lim  [n](Bn(f;x 0)  -  f(x0))  =  \x0(l  -  x0)f"(x 0). 

It  is  well  known  that  the  classical  Bernstein  polynomials  Bnf  provide  simultaneous 
approximation  of  the  function  and  its  derivatives.  That  is  if  /  €  ^[0, 1],  then 

lim  B^p)(/;x)  =  flp)(x) 

71—+00 

uniformly  on  [0, 1].  It  is  worthwhile  to  examine  if  this  property  hold  for  g-Bernstein  poly¬ 
nomials.  Phillips  [7]  proved  that  th epth  derivative  of  g-Bernstein  polynomials  converges 
uniformly  on  [0, 1]  to  the  pth  derivative  of  /  under  some  restrictions  of  the  parameter  g. 
This  property  results  from  the  generalization  of  the  following  theorem. 

Theorem  2.4  Let  f  €  C1  [0, 1]  and  let  the  sequence  (gn)  be  chosen  so  that  the  sequence 
(en)  converges  to  zero  from  above  faster  than  (l/3n),  where 


1  +  Qn  +  Qn  H - H  Qn  1 

Then  the  sequence  of  derivatives  of  the  generalized  Bernstein  polynomials,  Bfnf,  con¬ 
verges  uniformly  on  [0,1]  to  f(x). 

Up  to  now  the  convergence  of  g-Bernstein  polynomials  is  examined  by  taking  a  se¬ 
quence  q  —  qn  such  that  qn  — >  1  as  n  — *  oo.  In  the  recent  developments,  the  convergence 
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of  g-Bernstein  polynomials  is  examined  for  fixed  real  q,  0  <  q  <  1  and  for  q  >  1.  It  is 
proved  in  Orug  and  Tuncer  [6]  that  for  a  fixed  q,  0  <  q  <  1,  the  uniform  convergence 
holds  if  and  only  if  /  is  linear  on  the  interval  [0, 1].  Moreover,  if  q  >  1,  Bnf  — >  f  as 
n  — »  oo  if  /  is  a  polynomial. 

Theorem  2.5  Let  q  >  1  be  a  fixed  real  number.  Then,  for  any  polynomial  p, 

lim  Bn(p\x)  =p(x). 

71— >  OO 

For  any  fixed  integer  i ,  the  ^-Bernstein  polynomials  of  monomials  (see  Goodman 
et.al  [4])  can  be  written  explicitly  as 


■Bnix^x)  =  [n]3  iSq{i,j)x3, 

3=0 


(2.1) 


where 


-SH); 


an  empty  product  denotes  1,  and 


Sqihj)  —  r.-l 


1 


r(r— 1)/2 


o  <i<j, 


(2.2) 


\j]\qiU- D/2r=o 

is  the  Stirling  polynomial  of  second  kind.  Thus  for  any  polynomial  p  of  degree  m,  one 
may  write 

Snip;  x)  =  a7  Ax,  (2.3) 

where  a  is  the  vector  whose  elements  are  the  coefficients  of  p,  A  is  an  (m  + 1)  x  (m  + 1) 
lower  triangular  matrix  with  the  elements 


_  f  A j[n]i' 

ai’3  ~  \  0, 


i<j, 


(2.4) 


and  x  is  the  vector  whose  elements  form  the  standard  basis  for  the  space  of  polynomials 
Pm  of  degree  m. 

Lemma  2.1  Let  0  <  q  <  1  be  a  fixed  real  number.  Then 

lim  Bn(p;x)  =  p{x) 

n—*  oo 

if  and  only  if  p(x)  is  linear. 

This  lemma  can  be  generalized  for  any  function  /  £  C[0, 1]. 

Theorem  2.6  Let  0  <  q  <  1  be  a  fixed  real  number  and  f  6  £[0, 1].  Then 

lim  Bn(f;x)  =  f(x) 


if  and  only  if  f(x)  is  linear. 
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3  The  iterates 

The  iterates  of  classical  Bernstein  polynomials  were  first  studied  by  Kelisky  and  Rivlin 
[5].  The  authors  proved  that  iterates  of  Bernstein  polynomials  converge  to  linear  end 
point  interpolants  on  [0, 1].  Several  generalization  of  the  result  due  to  Kelisky  and  Rivlin 
has  been  considered  by  many  authors;  see  Sevy  [9]  and  Wenz  [10].  The  recent  result  is 
the  convergence  of  iterates  of  generalized  Bernstein  polynomials.  It  is  proved  in  Orug  and 
Tuncer  [6]  that  the  ^-Bernstein  polynomials  do  preserve  the  convergence  property  of  iter¬ 
ates  of  classical  Bernstein  polynomial.  The  iterates  of  generalized  Bernstein  polynomial 
are  defined  by 

B^+1(f-,x)  =  Bn(B^(f-,x);x),  M  =  1,2,...,  (3.1) 

wher  eBj(/;i)  =  B„(/;i). 


Theorem  3.1  Let  q  >  0  be  a  fixed  real  number.  Then 

lim  BnM(/;  x)  =  f(0)  +  (/(l)  -  /(O))*.  (3.2) 

M— K50 

Let  A  and  B  be  operators  then  the  Boolean  sum  of  A  and  B  is  defined  to  be 

A®  B  =  A  +  B  -  Ao  B. 


We  will  be  concerned  with  iterated  Boolean  sums  of  the  generalized  Bernstein  polyno¬ 
mials  in  the  form  Bn  ©  Bn  0  •  •  *  0  Bn  and  will  denote  such  an  M- fold  Boolean  sum  of 
the  generalized  Bernstein  operators  by  0M£n.  Sevy  [9]  and  Wenz  [10]  proved  that  the 
limit  of  iterated  Boolean  sums  of  Bernstein  polynomials  is  the  interpolation  polynomial 
with  respect  to  the  nodes  (£,/(£))  i  =  0, . . .  ,n  as  M  — *  oo.  The  second  theorem  of  this 
section  will  give  a  result  for  the  convergence  of  iterates  of  Boolean  sums  of  generalized 
Bernstein  polynomials.  It  is  proved  in  Orug  and  Tuncer  [6]  that  the  iterates  of  Boolean 
sums  of  g-Bernstein  polynomials  converge  to  the  interpolating  polynomial  at  the  nodes 


Theorem  3.2  The  iterated  Boolean  sum  of  the  q-Bemstein  operator  Bn(f\x)  as¬ 
sociated  with  the  function  f(x)  e  C[ 0, 1]  converges  to  the  interpolating  polynomial  Lnf 
of  degree  n  of  f(x)  at  the  points  Xi  =  ]i]/[n],  i  =  0, 1, . . . ,  n. 

4  A  difference  operator  Vq  on  generalized  Bernstein  polynomials 

Given  any  function  f(x)  and  q  €  R  we  define  the  operator  T>q 

®./W  =  (4.D 

qx  —  x 

Thus  Vqf{x)  is  simply  a  divided  difference,  T>qf(x)  =  f[x,qx).  Note  that,  for  a  function 
/  and  non-negative  integer  k 

f[x,  qx, . . . , qkx]  =  pjyX>£/(x). 
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Theorem  4.1  For  any  integer  0  <k  <n, 


VkBn(f ;  ®)  =  [n]  •  •  •  [n  —  k  + 1]  Y  Afc/r 


n  (l-fl**). 


Proof:  Recall  the  g-difference  form  of  generalized  Bernstein  polynomials  (1.5)  and 
apply  the  operator  Vq  to  Bn(f ;  x )  repeatedly  k  times  to  get, 


VkqBn(f;x)  =  Y 


[n  —  k  —  r]![r]! 


Afc+r/o*r. 


It  will  be  useful  to  express  Afc+r  in  terms  of  Ak.  One  may  prove  by  induction  on  m  that, 
for  0  <  m  <  n  —  k  we  may  write 

m  r  1 

A m+kfi  =  Y(-W(t+2t~1)/2  7 

i— n  L  1  - 


Now  applying  the  latter  identity  to  (4.2)  gives 


VkBn(f-,x)=YY(- !)*< 


t(t+2k—l)/2 


[n  —  k  —  r]![r]!  [ t 


Akfr-txr.  (4.3) 


Writing  m  =  r  —  t 

[n]!  "m  +  t  1  _  [n]!  \n  —  k  —  ml  (  . 

[n  —  k  —  m  —  t]\[m  + 1]!  _  t  _  [n  -  k  —  m]![m]!  _  t 

and  putting  (4.4)  in  (4.3)  we  obtain 

n—k  r  n—k—m  r  , 

T>kB(f‘x)—\' _ EL _ A kf  xm  V  f— iy0i(i+2fc-i)/2  n  —  R—.m  t 

771' — -U  t — U 

Now,  it  can  be  easily  derived  from  generalized  binomial  expansion  (1.3),  on  replacing  x 
by  qkx,  that 


U  (1-9*®)=  y  (-i)Y(t+2fc-i)/2 


n  —  k  —  ml  + 
t  X  ‘ 


This  completes  the  proof. 


From  Theorem  4.1  we  see  that,  with  0  <  q  <  1,  if  A  kfr  >0for0  <  r  <  n  —  k  then 
T>kBn(f]x)  >  0.  If  /  is  convex  on  0  <  x  <  1  then  VqBn(f ;  x)  >  0  for  0  <  q  <  1.  If  /  is 
increasing  then  VqBn(f ;  x)  >  0,  for  0  <  q  <  1. 


Acknowledgment:  The  second  author  is  supported  from  the  Institute  of  Natural  and 
Applied  Sciences  of  D.E.U.  and  this  research  is  partially  supported  by  the  grant  AFS 
0922.20.01.02. 


Halil  Orug  and  Necibe  Tuncer 


Bibliography 

1.  G.  E.  Andrews,  The  Theory  of  Partitions,  Cambridge  University  Press,  Cambridge, 
1998. 

2.  E.  W.  Cheney,  Introduction  to  Approximation  Theory ,  AMS  Chelsea,  Providence, 
1981. 

3.  P.  J.  Davis,  Interpolation  and  Approximation,  Dover  Publications,  New  York,  1975. 

4.  T.  N.  T.  Goodman,  H.  Orug,  and  G.  M.  Phillips,  Convexity  and  generalized  Bern¬ 
stein  polynomials,  Proc.  Edin.  Math .  Soc.  42  (1999)  179-190. 

5.  R.  P  Kelisky  and  T.  J.  Rivlin,  Iterates  of  Bernstein  polynomials,  Pacific  J.  Math. 
21  (1967),  511-520. 

6.  H.  Orug  and  N.  Tuncer,  On  the  convergence  and  iterates  of  g-Bernstein  polynomials, 
J.  Approx.  Theory ,  to  appear. 

7.  G.  M.  Phillips  On  generalized  Bernstein  polynomials,  Numerical  Analysis ,  D.  Grif¬ 
fiths  and  G.  Watson  eds.  (1996),  263-269. 

8.  G.  M.  Phillips,  Bernstein  polynomials  based  on  the  g-integers,  The  heritage  of  P.  L. 
Chebyshev:  a  Festschrift  in  honor  of  the  70th  birthday  of  T.  J.  Rivlin.  Ann.  Numer, 
Math.  4  (1997),  511-518. 

9.  J.  C.  Sevy,  Lagrange  and  least-square  polynomials  as  limits  of  linear  combinations 
of  iterates  of  Bernstein  and  Durrmeyer  polynomials,  J.  Approx.  Theory  80  (1995), 
267-271. 

10.  H.  J.  Wenz,  On  the  limits  of  (Linear  combinations  of)  iterates  of  linear  operators, 
J.  Approx.  Theory  89  (1997),  219-237. 


Uniform  Powell-Sabin  splines  for  the  polygonal 

hole  problem 

Joris  Windmolders  and  Paul  Dierckx 

Department  of  Computer  Sciences,  Kath.  University  Leuven,  Belgium. 

Joris . WindmoldersOcs .  kuleuven .  ac .  be,  Paul .  Dierckx@cs .  kuleuven  .ac.be 


Abstract 

An  algorithm  is  described  for  smoothly  filling  in  a  polygonal  hole  in  a  surface,  with  a 
parametric  uniform  Powell-Sabin  spline  surface  patch.  It  uses  interpolation  and  subdi¬ 
vision  techniques  for  iteratively  determining  an  approximating  solution.  No  assumptions 
are  made  about  the  surrounding  surface.  The  user  has  to  provide  routines  for  calculating 
the  curve  points  and  the  unit  surface  normal  along  the  edge,  as  well  as  the  unit  tangent 
vector  of  the  edge  curves,  parametrized  on  the  unit  interval. 


1  Introduction 

A  classical  problem  in  CAGD  is  to  fill  in  a  hole,  bounded  by  a  set  of  surfaces.  This 
problem  has  already  been  addressed  in  the  literature  (e.g.  [1,  2,  4]).  In  most  cases, 
assumptions  are  made  on  the  bounding  surfaces.  In  this  paper,  we  present  an  algorithm 
for  filling  in  a  3,  4,  5  or  6-sided  hole  that  makes  no  assumptions  on  the  surrounding 
surfaces,  and  therefore  it  is  generally  applicable.  On  the  other  hand,  the  filling  patch 
will  meet  the  given  boundary  curves  approximately.  The  input  of  our  algorithm  (see 
Figure  1)  consists  of  the  boundary  curves  p  which  join  at  their  endpoints.  Furthermore, 
the  user  should  provide  the  unit  tangent  vector  7  to  the  boundary  curves  at  any  point, 
and  the  unit  normal  vector  n  to  the  surrounding  surface  at  any  curve  point  except  the 
endpoints,  where  the  tangent  vectors  of  the  joining  curves  are  needed  only  (see  Figure 
1  again).  For  other  (interior)  curve  points,  our  algorithm  will  calculate  a  unit  vector 
6  =  ft  x  7,  which  will  be  called  the  (unit)  cross-boundary  tangent  vector.  It  shall  be 
referred  to  as  if  it  were  provided  by  the  user.  We  will  calculate  a  filling  surface  patch 
that  interpolates  the  user  supplied  boundary  curves  and  has  the  same  surface  normal  in 
a  number  of  points.  This  will  leave  us  some  degrees  of  freedom,  which  we  will  use  to  fit 
the  curve  and  the  cross-boundary  tangent  vector  in  between  each  pair  of  interpolation 
points.  In  section  2  we  briefly  recall  the  basic  properties  of  uniform  Powell-Sabin  splines. 
Section  3  explains  how  we  can  benefit  from  these  properties  to  use  UPS-splines  for  the 
polygonal  hole  problem.  Section  4  explains  our  algorithm  in  detail.  Finally  we  remark 
that  on  the  pictures,  we  will  denote  2D  and  3D  entities  inter changebly;  therefore  most 
pictures  reflect  the  situation  only  schematically. 
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Fig.  1.  User  supplied  data. 

2  Uniform  Powell-Sabin  splines 

This  section  recalls  the  main  properties  of  Uniform  Powell-Sabin  splines.  For  details, 
we  refer  to  the  original  papers  [3,  5]. 

By  A*)  we  denote  the  linear  space  of  uniform  Powell-Sabin  splines  (in  the  sequel 
called  UPS-splines),  i.e.,  piecewise  quadratic  polynomials  on  a  uniform  triangulation  A 
(which  means  that  all  triangles  are  equilateral  and  have  the  same  size)  of  a  polygon  O, 
where  A*  is  a  PS-refinement  of  A.  The  boundary  of  O  will  be  called  8fl,  whereas  the 
boundary  of  the  triangulation  will  be  referred  to  as  8 A.  The  vertices  of  A  are  denoted 
Vi,  i  =  1, . . . ,  n,  and  its  triangles  are  pi,i  =  1, . . . , m.  These  splines  have  global  C1- 
continuity  on  A*.  Any  s(u,v)  has  a  unique  B-spline  representation 

n  3 

s(u»v)  =  XmCijBi(u’v)>  (u>v)en,  (2.1) 

1=1  j=1 

where  the  locally  supported  basis  functions  form  a  convex  partition  of  unity  and  cy  G  R3 
are  the  control  points.  It  follows  that  s(u,v)  belongs  to  the  convex  hull  of  {cy}^.. 
Furthermore,  one  can  prove  that  the  control  triangles,  being  defined  as  T^c^i,  cj^), 
i  =  1, . . .  ,n,  are  tangent  to  the  surface  at  s(Vi).  Due  to  the  local  support  of  B{,  a 
change  to  cy  will  only  affect  s(u,v)|Mn  i.e.,  the  restriction  of  s(u,v)  to  the  molecule 
of  Vi,  being  the  set  of  triangles  pj  that  have  V*  as  a  vertex.  This  indicates  that  we 
have  a  useful  representation  for  C1 -continuous  surfaces,  without  being  restricted  to  a 
rectangular  domain,  and  still  enjoying  the  interesting  features  of  the  classical  B-spline 
representation  for  tensor  product  splines. 

2.1  Subdivision 

In  [5]  we  present  a  subdivision  scheme  for  UPS-splines.  Let  Ar  be  a  uniform  refine¬ 
ment  of  A,  obtained  by  midedge  subdivision.  For  a  given  s(u,  v)  on  A,  the  representation 
(2.1)  on  Ar  can  be  calculated  using  convex  barycentric  combinations  of  the  control  points 
only.  First,  a  new  control  triangle  along  each  edge  ViVj  is  calculated  as  illustrated  in 
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Fig.  2.  Subdivision  and  Bezier  points. 


Figure  2,  left,  for  the  bottom  edge  of  a  triangle  pi(Vi ,  Vj,  14)  €  A: 

{Cl  =  i(Cii2  +  Ci,3) 

c2  =  |cj,i  +  |(cli2  +  Cj,2)  (2.2) 

C3  =  |cj,i  +  |(cij3  +  Cj,3). 

Next,  the  control  triangles  at  the  original  vertices  are  rescaled:  for  example, 

f  <!  =  !c,,l  +  |(c1,2+Ci,3) 

<  CS,2  =  !ci,2  +  g(ci,3+cl>1)  (2.3) 

l  Ci,3  =  3Ci,3  +  g(Ci,l  +Cj,2). 

They  are  still  tangent  to  the  surface  at  their  barycenter,  but  their  area  is  only  a  quarter 
that  of  the  former  control  triangles.  Therefore  they  connect  tighter  to  the  surface. 


2.2  The  piecewise  Bezier  representation 


Another  important  property  of  the  B-spline  representation  for  UPS-splines,  is  that 
the  piecewise  Bezier  representation  can  be  calculated  from  (2.1)  using  simple  convex 
bary centric  combinations  of  the  control  points.  In  particular,  focus  an  edge  ViVj  of  A 
(see  Figure  2,  right).  The  Bezier  points  of  the  edge  curve  can  be  found  from: 

s(Vi)=pi  =  |(ci,i  +  Ci,2+Ci,3),  s(Vj)  =  pj  =  |(cj>1+cji2+Cj,3),  (2.4) 

1  2  1  1 

Ul  =  ~(c,,2  +  Ci,s),  Uj  =  gCj,!+  g(cj>2+Cj,3),  Tij  =  -(Ui  +  Uj).  (2.5) 


This  is  a  piecewise  quadratic  Bezier  curve,  which  means  that  pi,  ry  and  pj  are  surface 
points,  and  that  uj  -  pi  and  pj  -  Uj  are  tangent  to  the  surace  at  pi,  resp.  pj.  Assuming 
a  (counterclockwise)  ordering  of  the  boundary  vertices  Vi  €  <5 A,  the  edge  curve  from 
s(V|)  to  the  next  adjacent  point  s(Vj)  will  be  denoted  ei(u,  v). 


3  Application  to  the  polygonal  hole  problem 

Recall  that  our  goal  is  to  calculate  a  UPS-spline  filling  a  hole  in  a  surface,  given  by  a 
set  of  bounding  curves  (denoted  p),  their  derivatives  7  and  the  cross-boundary  tangent 
vectors  S.  The  UPS-patch  will  fit  these  curves  approximately  along  its  boundary.  In  the 
first  place,  interpolation  of  the  given  data  at  the  vertices  V*  £  SA  is  achieved.  This  leaves 
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Fig.  3.  Tangent  and  cross-boundary  tangent  vectors. 

some  degrees  of  freedom  allowing  to  fit  the  given  curves.  In  the  sequel  we  shall  denote 
the  user  supplied  data,  evaluated  at  V*,  by  (pu%6i). 

3.1  Interpolating  UPS-splines  and  degrees  of  freedom 

In  order  to  obtain  interpolation  we  determine  a  control  triangle  T{  in  the  tangent  plane 
spanned  by  pi  +  e7i  +  e,  v  6  R,  such  that  s(Vi)  =  pi.  Curve  point  interpolation  is 
simply  expressed  by  (2.4).  Furthermore,  we  let  the  tangent  to  ex  at  Vi  be  parallel  to  7 

Ul  -  pi  =  i(ci,2  +  Ci>3)  -  |ci,i  =  £*i7i,  (3.1) 

where  a*  is  a  scaling  factor.  Next,  we  need  the  cross-boundary  tangent  vector  of  s(u,  v) 
at  Vi  to  be  parallel  to  £*.  Mapping  the  cross-boundary  vector  d  in  the  domain  plane  (see 
Figure  2,  right)  onto  the  control  triangle  yields  a  vector  parallel  with  c^2  —  <^3: 

Ci,2  -  Ci,3  =  2A4  (3.2) 

where  A  is  again  a  scaling  factor. 

Solving  (2.4),  (3.1)  and  (3.2)  to  in  terms  of  the  unknown  a*  and  A  (further  called 
the  a-  and  /3-factors)  yields 

ci}i  =  Pi  -  cqfi 

Ci,2  =  Pi  +  ^7i  +  P'A  (3.3) 

Ci,3  —  Pi  +  ^7i  ~  A^i- 

These  equations  ensure  that  s(u,  v)  interpolates  the  given  data  at  V  E  <5A,  and  leaves 
us  two  degrees  of  freedom  per  vertex  (a;  and  A)*  These  scaling  factors  are  related  to 
the  size  of  the  control  triangle.  For  example,  subdivision  by  (2.3)  divides  a*  and  A  by  a 
factor  of  2. 

3.2  The  fitting  equations 

We  will  now  use  these  degrees  of  freedom  to  fit  the  user  supplied  data,  in  between  each 
pair  of  adjacent  interpolating  vertices  V^,  Vj  6  <5 A.  First,  the  a-factors  at  V  and  Vj  are 
determined  by  trying  to  interpolate  the  curve  p  at  the  edge  midpoint  Vij  =  |(F;  +  Vj). 
Ftom  Section  2.2,  the  interpolation  condition  reads  i*ij  =  |(ui  +  uj)  =  Pij,  where  py 
is  the  given  curve  point.  Taking  (2.5)  and  (3.3)  into  account,  we  have 

Uili  -  atfj  =  4pi  j  -  2(pj  +  Pj)  =  qij.  .  (3.4) 
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Fig.  4.  Consecutive  iteration  steps. 

This  is  a  system  of  3  equations  with  (at  most)  2  unknowns.  It  can  be  solved  in  the  least 
squares  sense. 

Next,  the  /3-factors  at  Vi  and  Vj  are  obtained  by  fitting  the  cross-boundary  tangent 
vector  at  Vij.  First,  we  derive  a  subdivision  rule  for  the  /3-factors  at  the  vertices  of  A 
from  (2.2)  and  (3.2): 

=  \(PiSi  +  PM  (3-5) 

where  SV  is  the  cross-boundary  tangent  vector  to  s(u,  v)  at  Viyj.  This  /3- ^-factor  belongs 
to  a  finer  subdivision  level  then  (3 \  and  /3j ,  so  we  have  to  scale  it  up  by  a  factor  of  2.  The 
interpolation  condition  then  is 

~  ^ ) *  (3-6) 

Note  that  has  been  used  instead  of  SV.  This  is  again  an  overdetermined  system 
which  can  be  solved  in  the  least  squares  sense. 


4  The  algorithm 


We  will  restrict  the  figures  illustrating  the  algorithm  to  the  case  of  a  triangular  hole, 
although  the  algorithm  is  immediately  applicable  to  cases  with  4,  5  and  6  boundary 
curves  as  well  (see  Section  4.4). 


The  idea  is  to  calculate,  during  a  pre-iteration  step,  an  initial  solution  which  is  smooth, 
but  in  general  not  close  enough,  and  to  refine  this  approximation  iteratively  to  obtain 
a  better  fit  to  the  given  curves  until  a  certain  stopping  criterion  is  satisfied.  Finally, 
during  a  post-iteration  step,  the  interior  control  triangles  are  calculated,  actually  filling 
the  hole.  Figure  4  illustrates  this:  imagine  a  pre-iteration  step,  two  refinement  steps  and 
a  post-iteration  step.  The  control  triangles  added  during  a  particular  step  have  been 
shaded. 


4.1  An  initial  solution 

The  initial  solution  (Figure  4,  leftmost)  is  easily  obtained  by  solving  (3.4)  in  the  least 
squares  sense  for  each  edge  ViVj.  If  we  assume  that  7 \  ^  7 j,  then 

•  a*  ~  ^  '  qi.j)  —  *  qij)(Ti  ■  ^ 


(4.1) 
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“  £>  (  (7j  *  Qi,j)  +  (7i  *  <lij)(7i  *  7j)) »  (4*2) 

where  D  =  1  —  ('fr'fj)2*  This  yields  two  a-factors  per  vertex:  one  for  each  boundary  edge 
being  incident  to  that  vertex.  Therefore,  T{  is  completely  determined.  The  /3-factors  can 
be  calculated  by  writing  (3.3)  for  both  edges  incident  with  the  vertex  and  eliminating 
C2,  respectively  Ci,  e.g.,  for  Figure  3,  right, 

Pi  =  <*2(72  •  <?i),  P2  =  -0:1(71  •  <52).  (4.3) 

There  exist  pathological  cases  where  72  _L  Si  or  71  _L  62-  Our  algorithm  then  sets 
Pi  —  ai,  resp.  /32  =  a2.  For  the  case  7 *  =  7 (3.4)  has  no  solution  in  the  least-squares 
sense.  Assuming  that  si  is  a  straight  line  from  s(Vi)  to  s(Vj),  the  a-factors  can  then  be 
determined  from  the  projection  onto  the  domain  plane,  where  the  size  of  the  so-called 
PS-triangles  (the  projections  of  the  control  triangles)  is  fixed.  The  reader  can  verify  that 
this  yields  a*  =  aj  =  ^\ViVj | . 

4.2  The  iteration  step 

First  the  control  triangles  from  the  previous  steps  are  rescaled  by  subdivision.  This  is 
simply  done  by  scaling  down  the  a-  and  /3-factors:  a*  <—  ^  and  pi  *—  for  each 
Vi  €  6 A,  Next,  a  new  control  triangle  is  created  in  between  any  two  adjacent  vertices  at 
the  coarser  level.  This  situation  is  illustrated  in  Figure  5,  left,  where  the  darker  triangles 
are  known.  We  are  looking  for  the  a-and  /^-factors  for  the  middle  control  polygon,  which 
is  tangent  to  the  surface  at  s(Vk),  T4  =  2  (^  +  Vj).  Consider  the  a-factor  first.  In  order 
to  obtain  a  better  fit,  we  try  to  interpolate  p  at  T \k  =  \(Yi  +  Vk)  and  Vk,j  =  \(Vk  +  Vj). 
This  yields  a  set  of  fitting  equations 

f  &ili  —  OLk^k  =  Qi,k?  (4  4) 

1  &klk  -  Oijij  =  qkj,  K  '  ' 

where  a i  and  otj  are  known.  Thus,  a k  can  be  obtained  as  the  least-squares  solution  of 
(4-4):  .  .  • 

ttfc  =  g  07*  '  (arti  “  +  Qkj  ^  Ojfj))-  '  (4.5) 

The  Pk -factor  is  found  by  fitting  the  cross-boundary  vectors  at  V5,fc  and  Vkj,  i.e.,  by 
solving  the  following  system  in  the  least-squares  sense: 

f  Pi,kSj,k  ~  2  iPi^i  4"  PkSjz),  (4 

\  Pk,j$k,j  —  \{Pk$k  +  PjSj), 

where  pi  and  pj  are  known.  If  5{^  =  5k  =  5k, j,  as  is  always  the  case  for  a  planar  curve, 
this  system  has  no  solution  in  the  least-squares  sense.  The  pk  factor  can  then  easily  be 
obtained  by  equation  (3.6),  i.e.,  by  subdivision  and  upscaling. 

4.3  The  interior  control  points 

Finally,  as  soon  as  the  user  supplied  edge  curves  have  been  approximated  well  enough, 
the  interior  control  points  at  the  eventual  refinement  level  have  to  be  calculated.  We  will 
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Fig.  5.  The  refinement  and  post-iteration  steps. 


Fig.  6.  The  hole  and  the  triangular  patches. 


discuss  three  possibilities  by  the  help  of  an  example;  Figure  6  shows  a  hole  (left)  and 
two  filling  patches  (right). 

Copy  From  Initial.  The  interior  control  points  are  obtained  directly  from  the  initial 
solution  by  subdivision.  This  guarantees  that  the  interior  of  the  patch  is  smooth.  A 
disadvantage  is  that  the  inner  of  the  first  approximation  in  general  has  no  connec¬ 
tion  with  the  shape  of  the  edge  curves.  This  can  cause  unwanted  artefacts  near  the 
boundary,  after  a  few  iterations  (see  Figure  7,  left).  The  next  option  will  therefore 
take  edge  features  into  account. 

Averaging.  We  will  fill  the  hole  gradually  by  calculating  a  ring  of  control  triangles 
during  each  pass,  going  from  the  edge  towards  the  inner  of  the  patch.  Figure  5,  right 
shows  an  example  where  each  ring  has  a  different  shade  of  grey.  At  each  step,  a 
control  triangle  of  the  current  ring  is  obtained  by  averaging  six  surrounding  control 
triangles.  These  come  from  the  initial  solution,  or,  if  possible,  from  a  previously 
calculated  ring.  Edge  features  are  now  smoothed  out  towards  the  inner  of  the  patch. 
However,  there  is  a  main  disadvantage  to  this  approach,  if  averaging  is  applied  after 
the  last  iteration  step:  the  unwanted  artefacts  mentioned  before  are  now  repeated 
for  every  ring,  smoothed  out  towards  the  inner  of  the  surface,  as  shown  on  Figure 
7,  middle. 

Instant  Update.  A  good  compromise  would  be  to  take  edge  features  into  account 
before  we  finish  iterating.  This  can  be  accomplished  by  subdividing  the  initial  solu¬ 
tion  at  each  refinement  step,  but,  we  always  overwrite  its  edge  with  the  most  recent 
boundary  approximation.  The  results  of  this  strategy  are  depicted  in  Figure  7,  right. 

In  any  case  can  the  user  change  the  interior  control  triangles,  and  still  he  has  a  C1- 
continuous  filling  patch,  fitting  the  specified  edge  curves  with  demanded  precision. 
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Fig.  7.  Copy  from  initial  solution  and  averaging  (4  iterations);  instant  update  (3  iter¬ 
ations). 


Fig.  8.  Cases  with  4,  5  and  6  boundary  curves. 


4.4  A  note  on  the  number  of  edges 

The  algorithm  sketched  in  Section  4  is  immediately  applicable  to  problems  with  4,  5 
and  6  boundary  curves  as  well.  Figure  8  shows  the  configuration  of  the  initial  solution 
for  each  of  these  cases.  If  we  are  working  with  5  edges,  there  are  2  edges  having  a 
control  triangle  at  its  midpoint  (shaded  darker) .  This  requires  a  tiny  modification  to  the 
calculation  of  the  initial  solution  for  those  edges.  The  a-factors  are  obtained  by  solving 
(4.4)  to  the  unknown  c^,  aj  and  o^.  The  ^-factors  of  the  outer  control  poygons  are 
obtained  as  usual;  for  the  middle  polygon  one  can  apply  (3.6).  Also,  for  the  cases  of  5 
and  6  boundary  curves,  an  interior  control  triangle  (unshaded)  has  to  be  calculated  for 
the  initial  solution.  This  can  be  done  by  averaging  the  six  surrounding  control  polygons. 
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Abstract 

Let  X  :=  Z/1  ([0,  To]) ,  where  To  represents  the  optical  depth  of  a  stel¬ 
lar  atmosphere.  The  weakly  singular  integral  operator  T  :  X— >X  defined  by 

PV)(t)  =  f  /;°  El{\r  - t'IMt')  dr', 

where  w  €]0, 1[  is  the  albedo  of  the  atmosphere  and  E\  denotes  the  first 
exponential-integral  function,  is  such  that  HT^  =  w{\  —  £2(to/2)),  where 
£2  denotes  the  second  exponential-integral  function.  If  w  is  close  to  1,  and 
To  is  large,  then  \\T\\t  is  close  to  1.  In  that  case,  the  transfer  problem 

given  f£X ,  find  p£X  such  that  T<p  =  ip  4*  / 

is  ill-conditioned,  and  the  convergence  of  the  fixed-point  iteration  <pk+i  =  T(pk~f,  which 
is  commonly  used  by  numerical  astronomers,  becomes  prohibitively  slow.  The  purposes 
of  this  work  are  to  approximate  tp  through  different  sequences  whose  terms  solve  well- 
conditioned  approximate  equations,  and  to  compare  their  efficiency  and  computational 
costs. 


1  Introduction 


For  a  given  tq  >  0,  let  g  be  a  function  defined  on  ]0,  tq]  such  that 


lim  g(r)  =  +00, 

T-+0+ 

(1.1) 

geCo()0,TO})nLl([0,TO)), 

(1.2) 

g(r)  >  0  for  all  r  e  ]0,r0], 

(1.3) 

g  is  a  decreasing  function  on  ]0,ro]. 

(1.4) 

We  consider  the  integral  operator  T  defined  by 

(Tx)(t)  :=  f  g(\r  —  t'\)x(t')  dr'. 

Jo 

(1.5) 

Theorem  1  T  is  a  linear  compact  operator  in  X1  ([0,  To])  and  HT^  =  2 

prof  2 

■  /  g{r )  dr. 

Jo 

Proof:  See  [2] . 

a 
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For  z  in  the  resolvent  set  of  T,  we  consider  the  Fredholm  equation  of  the  second  kind 

T<p  =  z<p  +  f.  (1.6) 

Applications  will  concern  the  function  g  :  ]0,  To]  M  given  by 


9{t)  :=  —E\{t)  (1.7) 

f  gxp( _ TLi) 

where  w  €  ]0, 1[  and  E\  is  the  exponential-integral  function  :  Ei  (r)  :=  /  - - — —  dpi, 

Ji  t1 

/'°°  gxpf _ rui) 

r  >  0.  Ei  is  the  first  function  of  the  sequence  (Ev)v>i,  Eu{r)  :=  /  v  v  dfi , 

J  i  M 

t  >  0,  v  >  2,  and  it  is  the  only  one  presenting  a  logarithmic  singularity  at  r  =  0. 
Following  Theorem  1,  when  g  is  defined  by  (1.7),  we  have  ||T'||  j  =  w[l  —  £,2(to/2)]  <1. 

We  recall  that  a  bounded  linear  finite  rank  operator  Tn  in  a  normed  linear  space  X 
can  be  written  as 

n 

Tn  :=  ^^(*  >  (1-8) 

i=l 

where  n  €  iV*,  and,  for  j  £  [l,n],  £nj  £  X*,  the  topological  adjoint  space  of  X,  and 
en,j  £  X. 

The  resolution  of  the  approximate  equation 

'  Tn<Pn  =  z<pn  + f,  (1.9) 

where  z  belongs  to  the  resolvent  set  of  Tn,  leads  to  an  n-dimensional  linear  system 

(An  z\n)xn  =  bn  (1.10) 

where  ln  is  the  identity  matrix  of  order  n, 

A n{^j)  •  n,j  ?  £n,i)i  bn(t)  1  =  (/  ,  ^n,i)j  Xn(j)  (^nj  ^n,j) •  (1.11) 

Once  this  system  is  solved,  the  solution  of  (1.9)  is  given  by 


¥>»  =  1  (l>(j)en,  -  ^  . 


(1.12) 


We  are  interested  in  refining  approximations  obtained  with  Tn  :=  7r nT,  where  7rn  is 
a  sequence  of  projections  with  finite  rank  n.  A  bounded  projection  7rn  of  finite  rank  n  is 

n 

defined  by  7rnx  :=  (x  >  en,j)en,j  for  all  x  £  X,  where  {enj)^=1  is  an  ordered  basis  of 

3- 1 

the  range  of  7rn,  and  (e*  j)^=1  is  an  adjoint  basis  of  the  former  in  X*.  Hence 

n 

Tnx  :=  ^^{Tx ,  6nj)enj,  #  £  X.  (1.13) 

j-i 

We  suppose  that  7rn  is  pointwise  convergent  to  the  identity  operator  in  the  Banach  X 
where  the  operator  T  is  defined.  Since  T  is  compact,  Tn  converges  to  T  in  the  operator 
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norm.  Let  R(z)  :=  ( T  -  zl)~~l  be  the  resolvent  of  T  at  Then  Rn(z)  :=  (Tn  -  zl)  1 
exists  for  n  large  enough  and  is  uniformly  bounded,  that  is,  there  exists  no  such  that 

cq{z)  :=  sup  ||JR„(^)||  <  +00.  (1.14) 

n>n0 

We  develop  an  application  in  the  space  X  :=  Z^QO,  To]).  Let  (rn}j)y=0  be  a  grid  on  [0,  To] 
such  that 


0  — •  Tn  0  <  Tn  1  <•**<!  Tn  1  ^  Tn  n  t—  Tq, 


and  set 


h> n,j  * —  7*n,j  Tnj— 1  for  j  G  [l, . . « ,n]. 
We  define,  for  r  €  [0,ro], 

„  1  if  t  €  (rnj_i,rnj) 


en,,(r)  := 


0  otherwise 


and,  for  x  €  ([0,  To]), 


1 

><,])  r — /  a;(T') rfr'- 


The  product  defined  in  (1.18)  is  a  special  case  of  the  scalar  product  used  in  equation 
(1.8)  when  a  grid  such  as  (1.15)  is  set.  In  this  case  the  operator  in  (1.13)  is  the  operator 
in  (1.8)  if  we  choose  £nj  =  Let 

fin  mm{hnj  :  j  <E  [1, . . .  ,n]},  hn  :=  max{hnj  :  j  €  [1, . . .  ,n]},  qn  ^-.(1.19) 

For  quasi-uniform  grids,  there  exists  a  constant  q  independent  of  n  such  that,  for  all  n, 
q  <  qn.  For  uniform  grids,  qn  =  1  for  all  n. 

Theorem  2  Let  (p  ^  0  be  the  solution  of  (1.6)  with  T  defined  by  (1.5).  Let  (pn  be  the 
solution  of  (1.9)  with  Tn  defined  by  (1.8)  and  (1.15)-(1.17).  Then,  forn  large  enough, 


ZjfAl  <  8co(2) 


—  fg\-r)  dT, 

Qn  JO 


where  Co(z)  is  given  by  (1.14)  and  computed  with  the  1-norm. 

Proof:  See  [2].  □ 

In  the  case  (1.7),  the  matrix  An  of  the  linear  system  (1.10)  has  entries 

A„(i,j)  ■=  f  [  £i(|r-r'|)enj(T')dr'dT,  (1.21) 

and  the  second  member  bn  has  entries 

M*)  :=  Jf—  f  f  Ei{\T-r'\)f(T')dT'dT. 


(1.22) 
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For  more  details,  see  [3].  An  application  to  the  transfer  problem  in  astrophysics  gives 
(1.6)  with  2  =  1,  and  as  free  term, 


fM  ,=  /  “I  if  0  <  r  <  to/2, 

'  \  0  if  r0/2  <  t  <  t0, 


(1.23) 


which  describes  a  sudden  drop  of  the  temperature  on  the  r  =  tq/2  layer  of  the  atmo¬ 
sphere.  For  further  details  on  the  physical  model,  see  [4]. 

2  Iterative  refinement  of  approximate  solutions 

To  attain  a  given  precision  on  the  approximate  solution  (pn ,  it  may  be  necessary  that  the 
largest  grid  step  hn  be  so  small  that  the  dimension  of  the  corresponding  linear  system 
will  be  prohibitively  large  from  a  computational  point  of  view.  Not  only  the  algorithm’s 
stability  becomes  poor  but  also  the  condition  number  of  the  matrix  may  increase  if  its 
size  increases.  Refinement  schemes  allow  us  to  attain  iteratively  the  exact  solution  of  a 
large  scale  linear  system  by  means  of  the  resolution  of  a  sequence  of  linear  systems  of 
moderate  fixed  size.  Let  us  consider  the  general  framework  of  a  complex  Banach  space 
X  and  a  linear  compact  operator  T  :  X  — >  X.  If  z  is  in  the  resolvent  set  of  T,  then 
z  0.  Let  Tn  be  a  sequence  of  linear  bounded  operators  in  X  such  that  \\T  —  Tn\\  — >  0 
in  the  operator  norm.  Then,  for  n  large  enough,  z  belongs  to  the  resolvent  set  of  Tn  and 
Rn(z)  is  norm-convergent  to  R(z). 

The  most  elementary  way  to  refine  the  approximate  solution  <pn  :=  Rn{z)f  is  the 
following. 

(  a:<°)  :=  tpn, 

Scheme  A  <  (2.1) 

[  :=  xW  -Rn(z)(Tx^  k>  0. 

We  can  interpret  Rn(z)  as  an  approximation  of  the  inverse  of  the  Frechet  derivative  of 
the  affine  operator  x  (T  —  zI)x  -  /,  the  exact  one  being  R(z).  Since  R(z)  satisfies  the 
identities 

R(z)  =  -z(R(z)T-I)  =  -z(TR(z)  - 1)  (2.2) 

two  new  different  approximations  of  R(z)  are  thus  motivated, 

Rn(z):=-(Rn(z)T-I),  Rn(z):=-(TRn(z)-I).  (2.3) 

z  z 

These  approximate  resolvent  operators  lead  to  the  following  iterative  refinement  schemes, 


Scheme  B 


x(°) 

:=  Rn(z)f, 

J(M-I) 

:=  xW  -  Rn(z)(TxW  - 

-/). 

k  >  0, 

*(°) 

:=  Rn(z)f , 

£(*+!) 

:=  -  Rn{z){TxW  -  zxW 

-/), 

k>  0. 

(2.4) 
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Scheme  C 


(2.5) 
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Since  the  computation  of  residuals  which  tend  to  zero,  as  well  as  the  resolution  of  almost 
homogeneous  linear  systems  may  be  unstable,  the  following  theorems  are  interesting  for 
algorithmic  purposes. 

Theorem  3  In  (2.1),  =  a:^  +  Rn(z)(Tn  -  T)x for  k  >  0. 

Theorem  4  In  (24),  z(fc+1)  =  £(0)  +  -Rn{z){Tn  -  T)Tx<*>  for  k  >  0. 

z 

Theorem  5  In  (2.5),  x(fc+1>  =  £(°>  +  -TR„(z)(Tn  -  T)x<*>  for  k>  0. 

z 

Proof:  For  each  k  >  0,  in  (3), 

x(k+l)  _  XW  _  Rn(z)(Tx(k)  -  ZX^  -  f) 

=  xW-i?„(^)(T-Tn  +  Tn-z/)xW+x(0) 

=  x<°>  +  -  T)*W 

For  (4)  and  (5),  the  proof  follows  the  same  idea  but  it  is  technically  more  complicated. 

□ 

In  our  application  to  the  transfer  equation  in  astrophysics,  T  is  defined  by  (1.5)  with 
g  given  by  (1.7),  and  the  equation  (1.6)  has  z  =  1. 


3  Numerical  computations 

The  iterative  refinement  schemes  allow  us  to  obtain  the  exact  solution  of  a  large  scale 
linear  system  by  solving  a  sequence  of  moderate  fixed  size  ones.  Each  of  the  three  iterative 
refinement  schemes  presented  in  this  work  are  based  on  an  approximation,  say  Gn(z ), 
of  the  resolvent  operator  i?(z).  Their  common  structure  is  the  following. 

f  <;(0)  ~  G„(z)f,  f3lV 

\  £(fc+1>  :=  +  (I  -  Gn(z)(T  -  zl)){(k\  k  >  0.  V  '  1 

Theorem  6  Letci(z)  :=  8co(z)max{l,  ||T’|j1  /|2r|},  and(^)k> o  be  any  of  the  sequences 
(2.1),  (24)  or  (2.5).  Then 


Mix  '  9n  Jo  ' 


k>  0. 


Proof:  Let  us  prove  the  bound  for  the  sequence  defined  by  (2.1).  For  the  other  two, 
the  arguments  are  similar.  Using  Theorem  3,  we  have 


xW-<p  =  (Rn(z){Tn-T))k{xW -tp), 
z(0)  -  <p  =  Rn(z)(T-Tn)p. 


Hence, 


x  <1  \(Rn(z)(T-Tn))‘ 


and,  in  [2],  we  have  shown  that  \\Rn(z)(Tn  —  T)||,  <  i  f  g(r)  dr.  □ 

Qn  Jo 

All  the  schemes  need  evaluations  of  T  at  some  prescribed  functions  of  A.  In  practice 
T  is  not  used  for  this  purpose  but  an  operator  Tm  of  the  sequence  ( Tu)u>\  is  used  instead, 
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where  m  >  n.  We  consider  the  kernel  g  defined  by  (1.7)  and  the  free  term  /  defined 
by  (1.23).  Table  1  gives  the  number  of  iterations  performed  by  each  scheme  for  several 
values  of  w  in  order  to  obtain  a  first  relative  residual  less  than  or  equal  to  10~12,  when 
a  quasi-uniform  grid  (r^)  • L0  is  built  such  that  v  is  a  multiple  of  10,  tq  =  1000, 


n  =  200,  m  =  1000,  and  hvi  :=  < 


12. 

2v 

To_ 
5i / 


if  i€[l,...,f], 
if  t€[  f  +  l,...,f], 


Tq 

2v 


(3.2) 


_y.  if  zg[|  +  1,...,^], 


4  r0  . 


if  i  £  [f§  +  1?  •  •  *  jH* 


^  i/ 


Albedo 

VJ 

Scheme  A 
(2.1) 

Scheme  B 
(2.4) 

Scheme  C 
(2.5) 

0.750 

29 

15 

14 

0.990 

46 

27 

26 

0.999 

385 

196 

195 

TAB  1.  Number  of  iterations. 


Figures  1,  2  and  3  show  the  last  iterate  of  all  schemes,  as  well  as  the  corresponding 
convergence  histories,  for  w  E  {0.750,0.990,0.999}.  As  we  can  see,  the  schemes  B  and 
C  are  much  faster  than  Atkinson’s  formula  A,  specially  when  the  albedo  is  close  to  1.  In 
the  latter  situation  a  wider  boundary  layer  arises  at  the  left  of  the  atmosphere,  and  the 
decay  at  the  middle  point  takes  place  along  a  wider  subinterval. 

A  survey  on  different  discretization  methods  for  integral  operators  can  be  found  in 
[1],  with  special  emphasis  on  spectral  applications.  In  what  concerns  condition  number 
of  associated  linear  systems,  the  reader  is  refered  to  [7],  [5]  and  [6]. 
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Solution  Residual 


Fig.  1.  Solution  and  convergence  history  for  to  =  0.750:  Scheme  A  —  dashed  line, 
Scheme  B  —  dotted  line,  Scheme  C  —  solid  line. 


Solution  Residual 


Fig.  2.  Solution  and  convergence  history  for  to  =  0.990:  Scheme  A  —  dashed  line, 
Scheme  B  —  dotted  line,  Scheme  C  ■ —  solid  line. 


Geometrical  symmetry  in  symmetric  Galerkin  BEM 

Alessandra  Aimi  and  Mauro  Diligenti 

Department  of  Mathematics,  University  of  Parma,  Italy. 

alessandra.aimi@unipr.it,  mauro.diligenti@unipr.it 

Abstract 

We  consider  a  symmetric  boundary  integral  formulation  associated  with  a  mixed  bound¬ 
ary  value  problem  defined  on  a  domain  ft  €  H2  with  piecewise  smooth  boundary  F. 

We  assume  that  ft  is  mapped  onto  itself  by  a  finite  group  Q  of  congruences  having  at 
least  two  distinct  elements.  Hence,  we  can  decompose  the  related  symmetric  Galerkin 
BEM  problem  into  independent  subproblems  of  reduced  dimension  with  respect  to  the 
complete  one.  Shape  functions  for  each  subproblem  can  be  obtained  from  classical  BEM 
basis,  ordered  as  a  vector,  applying  suitable  restriction  matrices  constructed  starting 
from  group  representation  theory. 

1  Introduction 

Let  ft  C  IR2,  be  a  bounded  domain  with  a  piecewise  smooth  boundary  T.  The_boundary 
r  is  partitioned  into  two  non  intersecting  open  subset  IT  and  r2,  with  r  =  ir|Jr2  = 
UU  ^ 3  5  ^  being  an  °Pen  strainght  line  segments.  In  the  following  we  always  assume 
measTi  >  0.  The  solution  of  the  mixed  boundary  value  problem 

L(x)u(x)  —  0  in  ft,  (1.1) 

'«(*)=«•(*)  onr,,  q(x)  :=  g  =  q*{x)  on  T2,  (1.2) 

can  be  expressed  by  the  representation  formula 

u{x)  =  J  U(x,y)q(y)dy  -  f  -f^-U(x,y)u(y)  dy,  xeil.  (1.3) 

In  (1.1)  L(')  is  an  elliptic  partial  differential  operator  of  second  order,  U(x,y)  its  fun¬ 
damental  solution  (see  [4]  for  a  general  discussion).  In  (1.2)  denotes  the  derivative 
with  respect  to  the  outher  normal  n  to  T,  and  u*  and  q*  are  given  functions.  Applica¬ 
tions  of  (1.1)-(1.2)  are,  for  instance,  boundary  value  problems  in  potential  theory  and 
in  elastostatic.  From  (1.3)  it  is  clear  that  if  we  want  to  recover  u  in  ft  we  have  firstly  to 
know  the  remaining  Cauchy  data,  since  in  (1.2)  these  functions  are  given  only  partially. 
Taking  the  limit  of  u(x)  for  xGfa  and  the  normal  derivative  f^(#)  for  x  e  T2  in  this 
formula  and  using  the  jump  relations,  one  finds  the  system  [2] 

[  U(x,y)q(y)dy-  /  ~U(x,y)u(y)dy  =  fi(x) ,  x  €  Tu 

Jr !  J r2  any 
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~  jT  ]^U(x’y'>q(y')dy+  £  -rj~g-~U{x,y)u(y)dy  =  f2(x) ,  x  e  T2.  (1.4) 

In  order  to  perform  the  Galerkin  method,  we  need  a  family  of  finite-dimensional  sub¬ 
spaces  {Uh>p(T)}  defined  on  I\  Let  us  define  a  mesh  Tjh  for  each  Tj:  Tj  =  (jf=i 
such  that  T3h  i  is  an  open  segment.  We  define  for  p  >  0,  h  >  0,  Uh,P(Ti)  to  be  the  set 
of  functions  on  Ti  whose  restrictions  to  P  C  P  belong  to  the  set  of  all  polynomials  of 
degree  <  p  on  T3hi:  Moreover,  for  p  >  1,  U^p(T2)  will  denote  those  continuous  functions 
on  T2  whose  restrictions  to  P  C  T2  belong  to  C°(r2)  and  which  vanish  at  the  end  points 
of  r2.  The  approximating  boundary  element  shape  functions  of  degree  p  >  0  are  defined 
through  the  standard  assembling  of  the  local  basis  functions  defined  on  each  Tjh  i.  We 
then  define 

Uh,p{ r)  :=  span  {(<£*,  ipe)  :  <pt  e  E/£p(r2),  ipi  €  Uh,p(Ti)}.  (1.5) 

The  corresponding  symmetric  Galerkin  boundary  elements  scheme  for  (1.4)  leads  to  a 
linear  system  of  the  form 

A£=b.  (1.6) 

If  the  boundary  T  presents  symmetry  properties,  we  will  exploit  them  to  reduce  the 
computational  cost  of  the  solution  of  (1.6),  using  a  decomposition  result  for  the  Galerkin 
boundary  element  problem  that  we  will  introduce  at  the  end  of  the  next  section. 

2  Matrix  representation  of  a  finite  group  of  congruences  and 
projection  operators 

Let  Q  be  a  finite  group  of  t  congruences  (t  >  2)  of  the  Euclidean  space  lRm  {m  =  2,  3). 
The  group  Q  can  be  described  by  orthogonal  matrices  7*  of  order  m.  Let  {71, ,  7*}  be 
the  elements  of  6,  7i  the  identity  matrix.  From  the  theory  of  group  representation  [5]  it 
follows  that  any  finite  group  Q  admits  a  finite  number  q  of  unitary  irreducible,  pairwise 
inequivalent  matrix  representations 

{w(1?(7i)},'{«(a)(7i)},...,{«to(70}  =  (2.1) 

Let  dn  be  the  order  of  the  representation  {0/^(7*)},  i.e.,  the  order  of  the  matrices 
0/^(7 i).  The  number  q  of  the  representations  (2.1)  and  the  orders  . . . ,  dq  only  de¬ 
pend  on  Q.  Any  representation  {0/^(7*)}  of  order  d#  >  2,  can  be  replaced,  in  the 
system  (2.1),  by  an  equivalent  unitary  representation.  Representations  of  order  1  are 
univocally  determined.  We  observe  that,  if  7*  and  7 j  are  two  elements  of  Q,  then 
w(£) (7i7j)  =  cjW (7 i)uW  (7 j) ,  (7”1)  =  [ojW  (7i)]* ,  where  [c (7*)]*  denote  the  trans¬ 

pose  of  the  matrix  0/^(7*).  Always  from  the  theory  of  group  representation  it  follows 
that  q  <  t  and  the  relation  d\  -j-  d%  +  •  •  •  +  d2q  =  t  holds.  Furthermore,  q  =  t  if  and  only 
if  d\  =  d2  =  •  *  •  =  dq  =  1.  Having  set  M  =  di  +  d2  +  *  •  •  +  then  q  <  M  <t,  and  we 
have  q  =  M  =  t  if  and  only  if  Q  is  an  abelian  group. 

Let  O  be  a  bounded  domain  in  1R2  with  a  piecewise  smooth  boundary  T,  invariant 
with  respect  to  Q,  i.e.,  sent  onto  itself  by  the  congruences  of  Q.  Also  the  boundary  F  is 
invariant  with  respect  to  Q,  i.e.,  for  any  7 e  €  G  and  x  e  T,  (jex)  e  T. 
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Let  W(r)  be  the  real  vector  space  of  real  functions  defined  on  T.  We  can  associate 
to  any  element  7*  of  Q  a  linear  transformation  Ti  defined,  for  any  v  £  W(F),  by 

(Tiv)(x):=v(  7"1a;)  xel\  (2.2) 

where  %  is  a  linear,  invertible  transformation  from  W(r)  onto  >V(r),  and  T\  is  the 
identity. 

Definition  2.1  A  subset  V(r)  of  W(T)  is  said  to  be  invariant  with  respect  to  Q  (or 
Q -invariant)  if  for  any  v  £  V(r)  and  any  7*  €  Q,  TiV  £  V(F). 

Obviously  if  v  is  a  function  of  W(r),  not  identically  equal  to  zero,  the  set  of  functions 
{ TiV ,  1  =  1,  ...,£}  is  invariant  with  respect  to  Q. 

Definition  2.2  Let  £  be  a  linear  operator  in  V(r).  We  will  say  that  £  is  invariant  with 
respect  to  Q  if  for  any  u  £  V(r):  £Ti  u  =  T{£u ,  i  =  1, . . . ,  t. 

Example  2.3  Let  V(F)  be  a  suitable  Sobolev  space  and  (£f)(x)  :=  fr  fC(: r,  y)f(y)dTy 
an  integral  operator  defined  on  V(F),  with  kernel  K(x,y). 

We  have:  Ti(£f)(x)  =  frJC('y~1x)y)f(y)dTy\  since  7*  £  Q  is  an  isometry,  the  mapping 
y  'yi'y  preserves  the  differential  element  dr^.  Thus 

C(Tif)(x)  =  f>C(x>y)f('y~1y)dTy  =  fK,(x,'yiy)f(y)dry  . 

Then  the  integral  operator  £  is  5-invariant  if  the  kernel  /C(x,  y)  satisfies  the  condition 
JC(x,y)  =  IC(nx,iiy)  for  all  xy  y  £  F,  i  = 

Starting  from  the  group  5,  the  system  of  representation  (2.1)  and  the  linear  trans¬ 
formations  Ti  defined  by  (2.2),  we  can  introduce  M  linear  transformations  of  W(r), 

'  'P*  =  tEwS('»)T'  =  =  (2.3) 

i— 1 

Owing  to  the  property  of  the  representations  (2.1),  there  holds 

Pl  =  Pek,  PikPfk'  =  0  if  (e,k)yL(e',k'),  EX>*=T>-  (2.4) 

£=1  fc=l 


The  linear  transformations  P^,  which  will  be  called  projection  operators ,  determine  a 
decomposition  of  any  vector  space  V(T)  C  W(T)  invariant  with  respect  to  5,  into  a 
direct  sum  of  M  subspaces  Va(r);  V^(F)  is  the  co-domain  of  P^,  viewed  as  a  linear 
transformation  from  V(T)  onto  itself. 

If  Q  is  a  non-abelian  group,  it  is  useful  to  consider  in  the  space  W(r)  further  linear 
transformations  linked  to  the  system  (2.1).  Let  (0/^(7*)}  be  a  representation  of  Q  of 
order  dt  >  2.  Let  us  consider  dj  linear  transformations,  already  introduced  in  [1],  defined 
as  follows 


A 


(e)  _  de 

kr  -  t 


E4r(7 i)Ti, 


»=1 


k,  r  =  1,.. .,  dt. 


(2.5) 
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If  k  =  r,  then  =  Pek. 

Definition  2.4  Let  B(-,  •)  be  a  bilinear  form  from  V(T)  x  V(r)  on  1R.  We  will  say  that 
B(-,')  is  G -invariant  if  for,  any  u,  v  e  V(T), 

B(TiU,Tiv)  =  B(u,v),  i  =  l, (2.6) 

Let  V(r)  be  a  Hilbert  space  and  let  us  consider  the  following  problem 

find  u€V(r)  :  B(u,v)  =  T(v)  for  all  t;  €  V(r),  (2.7) 

where  £(•,•)  is  continuous  and  coercive,  and  T{-)  :  V(F)  — >  H  a  linear  continuous 
functional.  If  T  and  V(r)  are  invariant  with  respect  to  Q ,  and  V(r)  =  0|=1  Vgfc(r) 
is  the  decomposition  of  V(r)  defined  by  the  projection  operators  (2.3)  the  following 
fundamental  result  holds. 

Theorem  2.5  IfB( •,  •)  verifies  the  condition  (2.6)  and  Pik  are  the  projection  operators 
defined  in  (2.3),  then  the  problem  (2.7)  can  be  decomposed  into  M  independent  problems; 
find  U£k  G  V^(r)  such  that 

B(u£k ,  vtk)  =  f(vek)  for  all  vik  G  Vek(T),  i  =  1, . . . ,  q\  k  =  1, . . . ,  de .  (2.8) 

The  solution  of  (2.7)  can  be  recovered  as  u  =  ©|=1  0^1=1  U£k- 

The  above  result  can  be  applied,  under  the  invariance  hypothesis,  in  the  discrete  form 
to  the  symmetric  Galerkin  BEM  scheme  if  we  choose  the  finite  dimensional  subspace 
Uh,p( F)  defined  in  (1.5),  to  be  Q- invariant  too,  and  therefore  decomposable  as  Uh,P(T)  = 
0f=1  ®kLi  Ufrkp(T).  Then  the  symmetric  Galerkin  boundary  element  problem  can  be 
decomposed  into  M  independent  problems  which  have  reduced  dimension  with  respect 
to  the  original  one  and  which  can  be  solved  on  parallel  processors.  Now  one  has  to 
construct  boundary  element  basis  functions  for  each  subspace  Uffp(Y).  With  some  simple 
geometries  (and  groups  of  congruences)  this  can  be  done  directly,  but  in  many  cases  this 
is  a  difficult  task.  We  solve  it  here  by  applying  restriction  matrices ,  which  we  introduce 
in  the  next  sections,  to  the  basis  of  Uh,P( r),  ordered  as  a  vector.  Since  there  is  a  ono- 
to-one  correspondence  between  the  standard  boundary  element  shape  functions  and  the 
nodes  of  the  mesh  fixed  on  F,  in  the  following  we  will  work  directly  on  the  nodes  of  the 
boundary. 

3  Elementary  restriction  matrices 

In  this  section  we  introduce  suitable  matrices  depending  only  on  the  group  Q  and  on  the 
system  of  representations  (2.1),  which*will  be  called  elementary  restriction  matrices .  In 
the  following  sections  we  will  see  how,  starting  from  these,  we  can  construct  restriction 
matrices  relative  to  a  mesh  defined  on T.  We  fix  a  finite  group  Q  =  {7!,...,  7*}  of 
congruences  of  lRm  and  a  system  (2.1)  of  orthogonal  irreducible,  pairwise  inequivalent 
representations  of  Q.  Q  always  admits  the  representation  {1,  1, . . . ,  1}  which  we  indicate 
by  (ci/1)^)};  let  us  order  the  remaining  representations  (2.1)  with  increasing  order  df>\ 
let  (7i)} , . .  • ,  {a/5)(7i)}  be  the  representations  of  order  1.  If  Q  is  an  abelian  group 
one  has  s  —  q  ~  t  and  d\  =  d2  =  •  •  *  =  dt  =  1.  If  Q  is  a  nonabelian  group,  it  holds 
s  <  q  <t  and  therefore  d\  =  d2  =  •  •  •  =  d8  =  1 ,  2  <  ds+ 1  <  •  ■  •  <  dq. 
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Let  Q  be  an  abelian  group.  We  will  call  elementary  restriction  matrices  the  following 
t  matrices,  with  1  row  and  t  columns 

Rn  =  Tt(wW(71)  wW(7*))  ’  C3-1) 

Since  representations  {(4/^(7*)}  are  real,  it  follows  that  0/^(7*)  =  ±1,  for  i  =  1, . . . ,  t. 

Let  G  be  a  nonabelian  group.  Correspondingly  to  the  representations  (0/^(7*)}  of 
order  1  of  the  system  (2.1),  we  introduce  matrices  Rn  with  1  row  and  t  columns 

=  ^(w(£)(7i)  wW(7t))  .  •••>«•  (3-2) 

We  obtain,  in  this  case,  s  matrices.  Let  now  {w(£)(7j)}  be  a  representation  of  the  system 

(2.1)  of  order  de ,  with  de  >  2.  With  k  =  1, . . . ,  de  fixed,  let  us  consider  the  following 
matrix,  with  de  rows  and  t  columns 

/  4fc(7i)  42(72)  4ft  (7t)  \ 

Wm  (7i)  42(72)  •••  42(7*)  (3  3) 

v  wd2fc( 71)  4?ft(72)  42k (7*) ) 

Due  to  the  orthogonality  properties  of  the  representation{u/^(7i)},  matrix  Rek  has  pair¬ 
wise  orthonormal  rows.  Therefore  the  rank  of  matrix  Rek  is  de .  For  any  representation 
{0/^(7 *)}  we  obtain  de  matrices  Rek  (k  —  1, de ).  Matrices  Rek  (£  =  1  = 

1, . . . ,  de)  defined  in  (3.2)  and  (3.3)  will  be  called  elementary  restriction  matrices.  The 
total  number  of  these  matrices  is  M,  with  M  —  d\  +  cfe  H - V  dq.  The  matrices  defined  in 

(3.1)  or  (3.2)-(3.3)  satisfy  some  properties,  easily  deducible  from  orthogonality  relations 
(2.4)  and  which  we  summarise  in  the  following. 

Theorem  3.1  ([1])  The  M  elementary  restriction  matrices  defined  by  (3.1)  or  (3.2)- 
(3.3)  verify  the  relations 

df 

RekR*tk  =  Idl ,  RaRt'k'  =0  if  (£,  k)  ±  (t,  k’),  J2J2  R^  =  1  (3-4) 

e=i  fc=i 

where  Ide ,  I  are  identity  matrices  of  order  de  and  t  respectively . 

4  W(E0)  spaces  and  elementary  restriction  matrices 

Let  T  be  the  piecewise  smooth  boundary  of  D,  invariant  with  respect  to  Q ,  and  a  €  T. 
Consider  the  ordered  set 

So  =  {a,721a,...,7ti'1o},  (4.1) 

and  the  space  ?i(£a)  of  real  functions  defined  in  £0.  A  natural  basis  B  in  H(£a)  is 
formed  by  functions  having  value  1  in  a  point  of  £0  and  0  in  the  remaining  points. 
Having  indicated  with  x  the  function  of  B  with  value  1  in  the  point  a,  we  obtain  the 
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ordered  basis  B  =  {x(z),  x  (72^)5  •  •  • ,  xiltx)},  such  that,  of  course, 

W(Sa)  =span{x(*),x(72^),---,x(7ta;)}-  (4-2) 

H(£a)  is  a  vector  space  with  finite  dimension  n  <  t,  invariant  with  respect  to  Q  (since 
£ a  is  invariant  with  respect  to  Q)  and  therefore  decomposable  into  direct  sum  of  M 
subspaces  7iek( Ea).  Having  set  ri£  =  dim7-^fc(£a),  we  have  n  =  Y^i=i  dtn£- 

Definition  4.1  We  say  that  a  is  a  generic  point  ofT  (with  respect  to  the  group  Q)  if 
dim'H(£a)  —  t  or,  equivalently ,  if  all  the  elements  of  £a  are  distinct 

The  following  results  hold. 

Theorem  4.2  ([1])  Having  fixed  any  point  a  G  V,  if  {0^(7*)}  is  a  representation  of 
order  1,  then  ftn(Ea)  =  span{P^x}  an d  ne  <  1*  If  *)}  is  a  representation  of 
order  dt  >  2,  one  has 

Wtt(Sa)  =  span{A^x, . . . ,  Aj^x},  k  =  l,...,de,  (4.3) 

and  therefore  nt  <  d^.  If  a  is  a  generic  point,  then  nt  —  dt>  for  any  t. 

Let  now  V1  be  the  column  vector  (x(^)?  x(72#),  •  •  • ,  x(7 whose  order  is  related  to 
that  one  fixed  for  the  elements  of  Q.  Corresponding  to  the  representations  of  order  1  of  Q, 
for  the  elementary  restriction  matrices  defined  in  (3.1),  (3.2)  we  have  RaV*  =  VtPnX- 
From  Theorem  4.2,  it  follows  that 

^1(Sa)=span{P,1Ft}.  (4.4) 

Corresponding  to  the  representations  of  order  di  >  2,  for  the  elementary  restriction 
matrices  defined  in  (3.3)  we  have  RikV1  =  yj tj di ^ x,  A^Xj •  A Mexj'  •  From 
(4.3),  it  follows  that 

Hek&a)  =  sP!m{RekVt}.  (4.5) 

In  both  cases,  \i  a  is  a  generic  point ,  the  components  of  the  vector  RikV1  constitute  a 
basis  in  Hik{^>a)‘  Therefore,  for  any  generic  point  a,  the  elementary  restriction  matrix 
Rtk  represents  the  projection  operator  P \k  from  7i( £a)  onto  %fe(Ea),  if  we  choose  V* 
as  a  basis  in  'ft(Ea). 

Now,  we  want  to  construct  elementary  restriction  matrices  Rik  which  represent  the 
projection  operators  Ptk  from  7f(£a)  onto  Hikf^a)  for  nongeneric  points.  Therefore  let 
us  suppose  a  to  be  a  nongeneric  point ,  i.e.,  such  that  the  functions 

x{x),x{l2x),---,x{-1tx)  (4.6) 

are  linearly  dependent.  Let  n  be  the  maximum  number  of  linearly  independent  functions 
among  (4.6)  and  let  the  following  functions  be  linearly  independent, 

■  x(7ii*).-”iX(7<„*)-.  (4-7) 

It  is  convenient  to  order  the  functions  (4,7)  with  increasing  index  ia;  therefore  let  us 
suppose  i\  <  %2  <  ♦  •  •  <  in-  In  this  case  elementary  restriction  matrices  Rik  will  have  n 
columns.  The  number  ne  of  rows  ( n a  <  df)  of  each  R^  is  not  determined  by  ^ .  •  • ,  in- 
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In  general,  we  only  can  say  that  matrices  Rf\, . . . ,  Rfdt  have  the  same  number  nf  of  rows, 
where  =  dim  ^fc(Sa)- 

Then  we  now  consider  a  significant  class  of  nongeneric  points.  Having  fixed  t  (l  = 
2, . . . ,  t),  let  It(T)  be  the  set  of  all  points  a  £  F  such  that 

a  =  7 Jla.  (4.8) 

From  (4.8)  it  follows,  for  any  i  :  x(7 i%)  =  ix)-  This  implies  that  the  functions  (4.6) 
are  naturally  subdivided  into  subsets  and  any  subset  contains  coincident  functions.  Then 
we  can  obtain  elementary  restriction  matrices  for  the  space  7i(Ea)  with  a  £  If(T)  starting 
from  elementary  restriction  matrices  built  in  Section  3,  with  the  following  procedure, 

•  Let  us  sum  to  each  column  of  index  ia  (a  =  1, . . . ,  n)  all  the  columns  of  index  j, 
With  j  such  that  7 Jla  =  7 ~1a.  We  indicate  with  Rfk  the  obtained  matrices,  all 
with  dt  rows  and  n  columns,  but  not  all  full-rank  matrices;  some  of  these  may  be 
zero  matrices. 

•  Let  us  extract  from  nonzero  matrices  Rfk  submatrices  Rfk  made  up  of  n£  linearly 
independent  rows. 

•  Finally,  let  us  construct  from  Rfk  matrices  Rfk  with  a  row-orthonormalization  pro¬ 
cedure. 

The  (nonzero)  matrices  Rfk  verify  the  properties  expressed  by  Theorem  3.1.  Further¬ 
more,  matrices  Rfk ,  applied  to  the  vector  Vn  =  (x(7»i:r)?  •  •  •  ?  x(7*n:c))*  corresponding 
to  a  point  a  £  If( T),  give  vectors  whose  components  constitute  a  basis  for  Tifki^a)- 
For  this  reason  they  represent  the  projection  operators  from  H(£a)  onto  Hfki'Ea),  for 
any  a  £  /^(r).  Then  we  will  say  that  the  matrices  Rfk,  with  nf  rows  and  n  columns, 
are  elementary  restriction  matrices  for  the  space  ?i(£a)  relative  to  points  a  £  /^(r). 
Furthermore  n  =  dt n t* 

5  spaces  and  restriction  matrices 

Let  T  be  the  piecewise  smooth  boundary  of  Cl,  £  a  set  formed  by  N  points  of  F  consti¬ 
tuting  a  not  necessarily  uniform  mesh  defined  on  F.  Let  us  suppose  T  and  E  invariant 
with  respect  to  Q.  Let  H( E)  be  the  vector  space  of  real  functions  defined  in  E.  H( E)  is 
a  iV-dimensional  vector  space,  invariant  with  respect  to  Q\  this  is  due  to  the  fact  that  E 
is  invariant  with  respect  to  Q.  A  natural  basis  B  in  H( E),  invariant  with  respect  to  Q , 
is  formed  by  functions  having  value  1  in  a  point  of  E  and  0  in  the  remaining  points.  In 
order  to  more  easily  construct  restriction  matrices  for  the  space  7i(E),  or  equivalently 
for  the  mesh  E,  it  is  suitable  to  introduce  in  the  set  E  the  following  equivalence  relation. 

Definition  5.1  We  say  that  a  point  a '  is  equivalent  to  a "  if  there  exists  an  element 
7i  G  Q  such  that  a”  =  7 ~fxo!  (and  therefore  a1  =  71a")- 

The  points  of  the  set  E  are  then  subdivided  into  r  equivalence  classes.  If  r  =  1  one 
has  7i(E)  '=  7^(Ea),  with  a  €  E.  Then  let  us  suppose  r  >  2.  We  order  the  points  of  the 
set  £  as  follows;  having  indicated  with  a\ , . . . ,  ar  r  pairwise  inequivalent  points  of  E,  we 
consider  the  following  ordered  points 


(5.1) 
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If  points  (5.1)  are  distinct,  we  have  N  —  rt.  If  some  points  among  (5.1)  coincide,  we  will 
erase  from  the  sequence  (5.1)  a  point  if  it  is  equal  to  a  previous  one.  Then  a  sequence 
of  N  points,  with  N  <  rt ,  will  remain,  with  n W  points  equivalent  to  oi,  n equivalent 
to  a2,. . equivalent  to  ar.  In  both  cases  H{t)  - ft(E0l)  0ft(EO2)  ®  •  *  •  ©ft(Ear) , 
with  dim  W(Eaj)  =  <  t,  j  =  1, . . . , r  and  N  =  n +  n (2)  -f  •  v  +  n^r\  We  indicate 

by  the  elementary  restriction  matrices  relative  to  the  space  7t( E0J),  constructed  as 
indicated  in  Section  4.  Let  nP  be  the  number  of  rows  of  the  matrix  ;  having  fixed  j, 

the  number  of  columns  of  matrices  ,  for  any  £  and  k,  is  n^\  We  consider  therefore 
the  following  M  block  matrices 

o  o  o 

O  c£>  o  ...  o 

o  o  o  ...  dg 

with  Ni  =  n^+n^+-  •  -+n^  rows  and  columns,  from  which  we  have  to  eliminate  the 
possible  zero  rows.  Matrices  R ^  determined  by  this  procedure,  which  we  call  restriction 
matrices  for  the  space  7i( E)  of  dimension  N,  have  rank  equal  to  the  number  Ni  of  the 
remained  rows  and  for  these  matrices  properties  expressed  in  Theorem  3.1  still  hold.  In 
both  cases,  we  have  the  following  theorem. 

Theorem  5.2  Considering  the  basis  B  in  7i( E)  as  a  column  vector  VN  with  the  or¬ 
der  deduced  from  (5.1),  the  components  of  the  vector  R^kVN  form  a  basis  in  ^^(E). 
Therefore  the  M  matrices  R^k,  having  fixed  in  7f(E)  the  ordered  basis  VN ,  determine  a 
decomposition  of  H( E)  in  M  subspaces,  which  coincides  with  the  one  obtained  with  the 
projection  operators 

Preliminary  numerical  results  appear  promising;  algorithms  for  potential  and  linear 
elasticity  problems  are  being  implemented  on  parallel  processors  to  analyse  the  efficiency 
of  the  proposed  approach. 

Bibliography 

1.  A.  Aimi,  L.  Bassotti,  and  M.  Diligenti,  Groups  of  Congruences  and  Restriction 
Matrices,  submitted  to  BIT. 

2.  A.  Aimi  and  M.  Diligenti,  Hypersingular  kernel  integration  in  3D  Galerkin  boundary 
element  method,  J.  Comp.  Appl.  Math.,  138,  1,  (2002),  51-72. 

3.  L.  Bassotti  Rizza,  Operatori  lineari  T-invarianti  rispetto  ad  un  gruppo  di  congru- 
enze,  Ann.  Mat.  Pura  ed  Appl. ,  148,  (1987),  173-205. 

4.  J.  L.  Lions  and  E.  Magenes:  N on-Homogeneous  Boundary  Value  Problems  and  Ap¬ 
plications  I,  Springer- Verlag,  Berlin,  Heidelberg,  New  York,  1972. 

5.  V.  I.  Smirnov,  Linear  Algebra  and  Group  Theory,  McGraw  Hill,  New  York,  1961. 


The  numerical  simulation  of  the  qualitative  behaviour 
of  Volterra  integro-differential  equations 

John  T.  Edwards,  Neville  J.  Ford  and  Jason  A.  Roberts 
j  .edwards@chester.ac.uk,  njford@chester.ac.uk,  j.roberts@chester.ac.uk 

Chester  College,  Parkgate  Road,  Chester,  CHI  4BJ,  UK. 


Abstract 

We  consider  the  qualitative  behaviour  of  exact  and  approximate  solutions  of  integral 
and  integro-differential  equations  with  fading  memory  kernels.  Over  long  time  intervals 
the  errors  in  numerical  schemes  may  become  so  large  that  they  mask  some  important 
properties  of  the  solution.  One  frequently  appeals  to  stability  theory  to  address  this 
weakness,  but  it  turns  out  that,  in  some  of  the  model  equations  we  have  considered, 
there  remains  a  gap  in  the  analysis. 

We  consider  a  linear  problem  of  the  form 

y'(t)  =  -  f  e-x^s)y{s)ds,  y{ 0)  =  1, 

Jo 

and  we  solve  the  equation  using  simple  numerical  schemes.  We  outline  the  known  sta¬ 
bility  behaviour  of  the  problem  and  derive  the  values  of  X  at  which  the  true  solution 
bifurcates.  We  give  the  corresponding  analysis  for  the  discrete  schemes  and  highlight 
that,  for  particular  stepsizes,  the  methods  give  unexpected  behaviour  and  we  show  that, 
as  the  step  size  of  the  numerical  scheme  decreases,  the  bifurcation  points  tend  towards 
those  of  the  continuous  scheme.  We  illustrate  our  results  with  some  numerical  examples. 


1  Introduction 

The  qualitative  behaviour  of  numerical  approximations  to  solutions  of  functional  differ¬ 
ential  equations  is  an  important  area  for  analysis.  We  aim  to  investigate  whether  the 
behaviour  of  the  numerical  solution  reflects  accurately  that  of  the  true  solution.  We  are 
particularly  concerned  with  the  behaviour  of  the  solution  over  long  time  periods  when  (in 
particular)  the  convergence  order  of  the  method  gives  us  limited  insight,  since  the  error 
depends  on  a  constant  that  grows  with  the  time  interval.  Many  authors  are  concerned 
with  stability  of  solutions  and  of  their  numerical  approximations.  We  have  considered 
elsewhere  (see  [7])  the  stability  of  numerical  solutions  of  equations  of  this  type  (and  of 
non-linear  extensions).  This  analysis  raised  a  number  of  questions,  which  we  consider 
here,  about  just  how  well  the  full  range  of  qualitative  behaviour  of  even  quite  a  simple 
equation  is  understood. 

Bifurcations  (by  which  we  shall  mean  any  change  in  the  qualitative  behaviour  of 
solutions)  frequently  arise  only  for  systems  or  for  higher  order  problems  and  therefore 
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one  is  particularly  interested  in  finding  suitable  simple  equations  as  the  basis  for  analysis. 
In  this  paper,  we  consider  the  solution  by  numerical  techniques  of  the  integro-differential 
equation 

y'(t)  =  -  f  e~x(-t~s)y(s)ds,  y( 0)  =  1.  (1.1) 

Jo 

The  equation  is  a  linear  convolution  equation  with  separable  fading  memory  con¬ 
volution  kernel  and  therefore  is  a  simple  example  from  an  important  class  of  problems 
familiar  in  applications.  It  is  also  possible  to  analyse  the  equation  in  the  form  of  a  second 
order  ordinary  differential  equation. 

The  equation  has  several  key  properties  that  make  it  an  ideal  basis  for  our  analysis: 

1.  it  depends  on  the  value  of  the  single  parameter  A,  . 

2.  when  A  varies  through  real  values,  four  distinctive  qualitative  behaviours  in  the 

solution  can  be  detected,  and 

3.  equations  with  exponential  convolution  kernels  frequently  arise  in  applications  and 

elsewhere  in  the  literature. 

For  A  real  and  positive,  the  kernel  is  of  fading  memory  type.  For  A  real  and  negative, 
the  kernel  has  a  growing  memory  effect.  This  linear  equation  displays  surprisingly  rich 
dynamical  behaviour  for  real  values  of  the  parameter  A  and  it  is  this  behaviour  that  we 
want  to  consider  for  the  numerical  scheme.  We  note  that  the  classical  test  equation 

y'(t)=g{t)  +  £y(t)  +  ri[y(s)ds,  Ti^O  (1.2) 

Jo 

([1,  2])  displays  the  same  range  of  qualitative  behaviour  possibilities  as  (1.1)  for  varying 
values  of  the  two  real  parameters  £,77. 

This  motivates  us  to  consider  equation  (1.1)  as  a  prototype  problem  that  is  interesting 
in  its  own  right  and  that  will  also  provide  insight  into  the  behaviour  of  more  complicated 
equations.  We  propose  to  give  ia  further  analysis,  where  we  consider  the  boundaries  along 
which  bifurcations  occur  for  equation  (1.2)  in  a  sequel  [3]. 

We  consider  the  following  questions. 

1.  Does  the  numerical  scheme  display  the  same  four  qualitatively  different  types  of 
long  term  behaviour  as  are  found  in  the  true  solution? 

2.  Are  the  interval  ranges  for  the  parameter  A  that  give  rise  to  the  changes  in  behaviour 
of  the  solution  the  same  as  in  the  original  problem? 


2  Behaviour  of  the  exact  solution 

We  consider  the  equation  (1.1)  which  can  be  shown  to  have  a  unique  continuous  solu¬ 
tion  (see,  for  example,  [10]).  One  can  easily  establish  (by  considering,  for  example,  an 
equivalent  ordinary  differential  equation)  the  general  solution 


,,N  A  -M->A2-4 
y(t)  =  Ae  2 


(2.1) 


where  A,  B  are  constants.  For  real  values  of  A  the  solution  to  (1.1)  bifurcates  (or  changes 
qualitative  behaviour)  at  A  =  0,  ±2.  We  have  the  following  qualitative  behaviour. 
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Al.  When  A  >  2,  y  — >  0  as  t  — >  oo,  with  no  oscillations. 

A2.  When  0<A<2,y— >0ast  — >00,  with  infinitely  many  oscillations. 

A3.  When  A  =  0,  y(t)  =  cos (t)  (persistent  oscillations). 

A4.  When  —  2  <  A  <  0,  the  solutions  contain  infinitely  many  oscillations  of  increasing 
amplitude. 

A5.  When  A  <  —  2,  the  solution  grows  (in  magnitude)  without  any  oscillations. 


3  Numerical  analysis 

To  apply  a  numerical  method  to  an  integro-differential  equation  of  the  type 


v'{t)  =  /  (t,2/(t)> 

V(s))ds )  ,  y(0)  =  3/0, 

(3.1) 

we  write  the  problem  in  the  form 

y'(t) 

=  f(t,y(t),z(t)) 

(3.2) 

«(0 

t 

=  /  k(t,s,y{s))ds. 
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(3.3) 

We  solve  (3.2),  (3.3)  numerically  using  a  linear  multistep  method  for  solving  equation 
(3.2)  combined  with  a  suitable  quadrature  rule  for  deriving  approximate  values  of  z  from 
equation  (3.3)  (see  [2]).  Such  a  method  is  sometimes  known  as  a  DQ-method.  For  linear 
fc-step  methods,  one  also  needs  to  provide  a  special  starting  procedure  to  generate  the 
additional  k  —  1  initial  approximations  to  the  solution  that  are  not  given  in  the  equation 
but  are  needed  by  the  multistep  method  on  its  first  application.  It  turns  out  that  one 
needs  to  choose  the  quadrature,  multi-step  method  and  starting  schemes  carefully  to 
ensure  that  the  resulting  method  is  of  an  appropriate  order  of  accuracy  for  the  work 
involved.  One  should  try  to  choose  schemes  of  the  same  orders  as  one  another  since  the 
order  of  the  overall  method  is  equal  to  the  lowest  of  the  orders  of  the  three  separate 
methods  (the  multistep  formula,  the  starting  value  scheme  and  the  quadrature)  used 
to  construct  it.  In  this  paper  we  have  chosen  to  focus  on  one-step  methods.  There  are 
two  reasons  for  this:  we  have  thereby  avoided  the  need  to  construct  special  starting 
procedures  which  would  make  our  analysis  more  complicated;  as  Wolkenfelt  showed  in 
[11],  methods  with  a  repetition  factor  of  1  (such  as  the  ones  we  consider)  are  always 
stable  and  we  also  draw  attention  (see  [9]  for  example),  to  the  fact  that  the  trapezoidal 
rule  is  an  A-stable  1-step  method. 

For  a  well-behaved  numerical  scheme  for  (3.2),  (3.3),  we  would  anticipate  four  in¬ 
tervals  (as  with  the  continuous  problem)  of  A-values  where  the  solutions  to  the  discrete 
scheme  behave  qualitatively  differently.  However  we  know  from  investigation  of  bifurc¬ 
ation  points  for  numerical  solution  of  delay  differential  equations  (see  [12])  and  indeed 
from  stability  analysis  of  integro-differential  equations,  that  the  points  at  which  the  qual¬ 
itative  behaviour  of  the  solution  changes  may  arise  at  the  wrong  values  of  the  parameter. 
Based  on  previous  experience  (see  [6])  we  would  expect  this  difference  to  be  dependent 
upon  the  stepsize  h  of  the  numerical  method  and  on  the  choice  of  method  itself.  Further¬ 
more  (see,  for  example  [8],  [12]),  one  might  expect  the  bifurcation  points  of  the  discrete 
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scheme  to  approach  the  bifurcation  points  of  the  continuous  problem  as  h  — >  0  and  one 
could  anticipate  that,  for  a  method  of  overall  order  p,  the  approximation  of  the  true  bi¬ 
furcation  point  by  the  bifurcation  point  of  the  numerical  scheme  would  also  be  to  0(hp ). 
We  will  show  in  this  paper  that  (for  h  — ►  0)  the  approximation  of  the  bifurcation  points 
in  the  methods  we  have  chosen  is  at  least  to  the  order  of  the  method. 

To  keep  the  analysis  reasonably  simple,  we  consider  the  following  discrete  form  of 
(3.2).  We  use  a  linear  0-method  in  each  case  so  that  we  solve  the  system 

Vn+l 

Fn 

Zji 

One  could  choose  any  combination  of  0j,O  <  0*  <  1  and  a  natural  choice  could  be 
0i  =  02-  However,  in  order  to  start  with  a  simple  method  where  the  algebraic  problem 
is  tractable  we  have  considered  first  the  cases  where  0i  =  0  and  we  consider  a  range  of 
values  of  02- 

One  solves  equations  of  the  form 

Vn+l  -  Vn  =  ~h2  ^e-Mn+i^o  +  £e-M(»+wy 

Note  that  we  have  used  a  simple  procedure  to  find  the  additional  starting  value  y\  =  1. 
We  have  observed  from  the  integro-differential  equation  that  yf( 0)  =  0  and  have  deduced 
that  y(h )  =  p(0)  will  provide  a  reasonable  order  1  starting  approximation.  This  choice 
of  formula  implies  that  we  are  combining  a  backward  Euler  scheme  to  discretise  the 
differential  equation,  with,  respectively,  (for  02  =  1)  the  forward  rectangular  (Euler) 
rule,  (for  02  =  |)  the  trapezoidal  rule  and  (for  02  =  0)  the  backward  rectangular  rule 
for  the  quadrature.  We  will  return  to  consider  other  combinations  of  0i ,  02  later. 

The  equation  (3.7)  is  equivalent  to 

(l  +  h2(l  -  02))  yn+ 2  +  (h202^~Xh  —  1  -  e  Ah)  yn+i  4-  e~Xhyn  =  0.  (3.8) 

The  behaviour  of  the  solution  as  t  — >  oo  depends  on  the  roots  of  the  characteristic 
equation 

(1  +  h2{  1  -  02))  k 2  +  (h202e-xh  -  1  -  e~Xh)  k  +  e~Xh  =  0.  (3.9) 

Any  solution  of  (3.8)  will  be  asymptotically  stable  if  both  roots  of  (3.9)  are  of  magnitude 
less  than  one  and  unstable  if  either  root  of  (3.9)  has  magnitude  greater  than  one.  The 
solutions  will  contain  (stable  or  unstable)  oscillations  when  the  roots  of|(3.9)  are  complex 
or,  indeed,  when  at  least  one  root  is  negative.  It  follows  from  thisqsee  [4])  that  the 
bifurcations  occur  as  follows  (for  reasonably  small  h  >  0). 


+  (1  -  02)2M+i^  ,  2/p  =  2/1  =  1. 

(3.7) 


—  2/n  +  h(6\Fn  +  (1  —  ^i)F„4-i),  n  —  0, 1, . . . ,  (3-4) 

=  f(nh,yn,zn),  ,  (3.5) 


J.  T .  Edwards,  N.  J.  Ford,  and  J.  A.  Roberts 


m  Txri  x  ^  ii  /  l+2h2-h292-2J~h2{h262~\-h^)\  n 

Bl.  When  A  >  In  f - Pflf-2fr2fl2+i -  I,  y„  -»  0  as  n  -+  oo  with  no 

oscillations.  This  condition  can  be  written  in  the  simpler  form 

A  >  i  In  (l  +  2h2  -  h20 2  +  2x/-ft2  W*h-  -  1  -  ft2)) 

and  we  thank  the  anonymous  referee  for  pointing  out  this  simplification. 

B2.  When  £ln(i+pfr^))  <  A  <  A  In  (l  +  2  h2  -  h2e2  +  2  (/i2<?2  -  1  -  ft2)) , 

yn  0  as  n  — +  oo,  with  infinitely  many  oscillations. 

B3.  When  A  =  £  In  (  Y+p-p-flz) )  we  obtain  persistent  oscillations. 

B4  When  A  In  (l  +  2 h2  -  h202  -  2 ^-h2  (/i202  -  1  -  /i2))  <  A  <  A  In 
the  solutions  contain  infinitely  many  oscillations  of  increasing  amplitude. 

B5.  When  A  <  ^  In  ^1  -f-  2 h2  —  h2Q2  —  2\j —K2  (/i202  —  1  —  h2)^j ,  the  solution  grows  (in 
magnitude)  without  any  oscillations. 

4  Bifurcation  points  of  the  numerical  scheme  as  approximations 
to  true  bifurcation  points 

We  consider  now  the  way  in  which  the  bifurcation  points  of  the  discrete  scheme  approx¬ 
imate  those  of  the  original  problem.  We  are  using  a  numerical  scheme  of  order  1. 

First  we  consider  the  value  of  Ai  =  £  In  +  2 h2  -  h2$2  +  2 yj-h2  (h202  -  1  -  h2)^j 
as  O2  varies  and  /z  — >  0.  It  is  easy  to  see  that,  as  h  0,  the  value  Xi  satisfies  Aj  — »  2. 
In  fact  we  can  give  greater  precision  to  this.  We  can  show  that  Ai  =  2  -  62k  +  0((e) 
as  h  — >  0.  This  means  that,  for  0  methods  in  general,  the  approximation  by  our  scheme 
approximates  the  true  value  (—2)  to  order  1  (the  order  of  the  method),  as  h — >  0.  In  the 
particular  case  62  =  0  the  approximation  is  to  order  2. 

For  A2  =  £  In  ^  straisbt forward  to  show  that  stability  is  lost  at  a 

value  of  A  that  approximates  the  true  value  (0)  to  order  1  in  general.  In  fact,  for  02  =  1, 
the  forward  Euler  scheme,  the  approximation  is  exact  for  all  values  of  h. 

The  analysis  of  A3  =  £  In  ^1  +  2 h2  -  h202  —  2yJ-h2  (/i202  —  1  —  h2)^j  follows  in  ex¬ 
actly  the  same  way  as  for  Ai  and  leads  to  an  identical  conclusion:  the  approximation  of 
the  bifurcation  point  A  =  —2  is  in  general  to  order  1  as  h  0  and  to  order  2  if  62  —  0. 

We  illustrate  our  results  graphically.  Each  of  the  plots  shown  in  Figure  1  illustrate, 
for  varying  h ,  the  ranges  for  the  parameter  A  where 

1.  the  solutions  are  unstable  due  to  at  least  one  real  root  greater  than  unity  in  magnitude 

(the  darkest  region  in  the  figures)  (exponential  growth  if  the  root  is  positive,  growing 
oscillations  if  the  root  is  negative) , 

2.  the  solutions  are  unstable  due  to  growing  oscillations  (the  next  darkest  region  in  the 

figures), 

3.  the  solutions  are  stable  with  asymptotically  stable  oscillations  (the  lightest  region  in 

the  figure),  and 
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4.  the  solutions  are  stable  with  exponentially  stable  decay. 

We  can  compare  with  the  right  hand  plot  in  Figure  2  which  shows  the  true  regions 
for  the  original  problem  and  we  can  make  the  following  observations. 

1.  As  h  —*  0  the  values  of  A  at  which  changes  in  the  behaviour  occur  approach  the  true 

values.  This  coincides  with  our  previous  experience  in  delay  equations  (see  [8]). 

2.  There  is  some  extremely  surprising  behaviour  for  some  values  of  h  >  0. 

(a)  For  the  two  values  02  =  0.5  and  02  —  1  we  can  see  that  the  darkest  region  is  in 
two  parts:  in  the  upper  part  there  is  a  negative  real  root  of  magnitude  greater 
than  unity  leading  to  exponentially  growing  oscillations  in  the  solution;  in  the 
lower  part  there  is  a  positive  real  root  of  modulus  greater  than  unity  leading 
to  exponential  growth  in  the  solutions. 

(b)  There  can  be  a  critical  value  of  h  >  0  (h  =  when  02  >  0)  at  which,  for 
apparently  arbitrarily  large  A  <  0  the  numerical  solution  displays  oscillatory 
behaviour. 

(c)  There  can  be  an  additional  thin  region  (visible  only  in  larger  scale  versions 
of  the  plots)  between  the  darkest  and  lightest  regions  in  which  there  is  a  real 
negative  root  of  magnitude  less  than  unity  leading  to  decaying  oscillations. 

(d)  For  02  =  0.5  and  02  —  1  the  upper  part  of  the  darkest  region  indicates  some 
really  strange  behaviour:  spurious  oscillations  may  arise  for  arbitrarily  large 
negative  values  of  A  and  even  (see  figure  1)  for  some  positive  values  of  A.  Thus 
we  can  have  the  situation  (for  example  for  A  small  and  positive)  where  the  true 
solution  tends  to  zero  while  the  approximate  solution  exhibits  oscillations  of 
growing  magnitude.  Alternatively,  (for  A  large  and  negative)  the  true  solution 
could  exhibit  high  index  exponential  growth  while  the  approximate  solution 
exhibits  oscillations.  We  draw  attention  also  to  the  fact  that,  for  02  =  0.5 
and  02  —  1  the  stability  boundary  of  the  method  is  made  up  of  parts  of  the 
boundaries  of  two  regions,  making  the  prediction  of  behaviour  for  varying  h>  0 
particularly  difficult. 

We  believe  that  these  observations  justify  our  view  that  more  attention  needs  to 
be  paid  to  changes  in  qualitative  behaviour  other  than  stability  in  reaching  a  good 
understanding  of  the  behaviour  of  numerical  methods  for  problems  of  this  type. 

We  can  consider  next  whether  these  observations  are  equally  true  for  other  choices  of 
numerical  method.  We  present  in  Figures  2  plots  revealing  the  qualitative  behaviour  of 
solutions  to  equations  (3.2),  (3.3)  with  other  choices  of  0-method.  It  is  easy  to  see  that, 
even  for  combinations  such  as  using  the  trapezium  rule  for  both  parts  of  the  discretisation 
(a  method  characterised  by  0X  =  $2  =  0.5  and  known  to  do  very  well  at  preserving  the 
stability  boundary)  there  are  problems  in  the  preservation  of  other  types  of  qualitative 
behaviour  when  h  is  not  very  small.  Similarly,  we  can  see  that  the  choice  0X  =  02  =  1 
leads  to  a  shrinking  range  (as  h  increases)  for  A  that  lead  to  stable  oscillatory  solutions. 
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Fig.  1.  Bifurcation  points  as  h  varies  for  9\  =  0, 62  —  0,  .5, 1  respectively. 
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Fig.  2.  Bifurcation  points  as  h  varies  for,  respectively,  9\  =  O2  =  0.5, 1  and  for  the 
analytical  problem. 


5  Alternative  approaches 

The  particular  equation  we  have  considered  can  be  formulated  as  an  integro-differential 
equation,  as  an  integral  equation  or  as  a  second  order  differential  equation.  We  have 
shown  in  [4]  that  the  interesting  and  somewhat  surprising  observations  about  numerical 
behaviour  that  we  made  in  the  previous  section  also  apply  in  these  other  formulations. 

6  Closing  remarks 

The  results  presented  in  this  paper  show  that  the  well-established  stability  theory  based 
on  the  analysis  of  equation  (1.2)  gives  only  a  very  limited  insight  into  the  qualitative 
behaviour  of  solutions  of  the  class  of  convolution  equations  with  exponential  memory 
kernel  that  we  have  considered  here.  We  have  observed  elsewhere  (see  [5,  6,  7])  that  the 
qualitative  behaviour  of  numerical  solutions  to  equations  of  this  type  may  have  surprising 
features  and  our  consideration  here  of  the  prototype  problem  (1.1)  illustrates  how  this 
unexpected  behaviour  may  arise.  We  have  seen  in  this  paper  how  oscillations  may  arise 
in  the  numerical  schemes  when  they  should  not,  and  how  in  other  cases  the  numerical 
schemes  may  supress  genuine  oscillatory  behaviour.  When  one  seeks  good  methods  based 
on  a  stability  analysis,  the  desire  is  to  focus  on  those  methods  where  the  step-length  h  >  0 
is  not  subject  to  some  upper  bound  to  ensure  the  stability  of  the  method.  However  our 
initial  observations  in  this  paper  have  shown  that  this  may  well  prove  an  unreasonable 
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expectation  when  one  is  investigating  these  other  changes  in  qualitative  behaviour. 

We  believe  that  this  paper  introduces  a  range  of  worthwhile  investigations  in  a  field 
that  is  still  quite  open.  Space  restrictions  have  prevented  us  from  considering  the  be¬ 
haviour  of  more  general  methods  in  this  paper  and  also  from  extending  our  analysis  to 
consider  other  problems.  The  results  we  have  presented  here  show  that,  for  these  simple 
methods  at  least,  the  bifurcation  parameters  are  approximated  in  the  numerical  scheme 
to  at  least  the  order  of  the  method,  for  sufficiently  small  h  >  0.  It  is  also  very  clear 
that,  even  for  what  appears  to  be  a  simple  problem,  the  choice  of  numerical  scheme  and 
the  form  in  which  the  problem  is  presented  provide  us  with  a  rich  source  of  example 
behaviour. 
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Abstract 

We  consider  systems  of  delay  differential  equations  of  the  form 

y'(t)  =  A(t)y(t  -  1) 

where  y  €  IFtn  and  A  :  1R,  — ►  IRn*n.  We  investigate  whether  a  numerical  method  can  be 
used  to  determine  whether  or  not  the  equation  has  so-called  small  solutions.  Our  work 
builds  on  recent  analysis  and  experimental  work  completed  in  the  scalar  case  and  we  are 
able  to  conclude  that,  at  least  when  A  is  a  suitable  periodic  matrix,  one  can  predict  small 
solutions  by  using  a  numerical  approximation  scheme  of  fixed  step  length. 


1  Introduction  and  basic  theory 

The  analysis  of  delay  differential  equations,  both  analytically  and  numerically,  is  well- 
established.  One  distinctive  feature  is  that  even  a  scalar  delay  differential  equation  is  an 
infinite  dimensional  problem.  For,  if  x  satisfies 

y'(t)  -  b(t)y(t  -  1)  (1.1) 

the  initial  conditions  that  need  to  be  specified  take  the  form 

y(t)  =  tp(t),  -l<t<  0.  (1.2) 

This  infinite  dimensionality  has  two  significant  implications  for  us: 

(1)  the  dimension  of  a  system  of  delay  equations  is  the  same  as  the  dimension  of  a 
scalar  delay  equation,  and 

(2)  the  range  of  dynamical  behaviour  among  solutions  of  delay  equations  is  far  wider 
than  would  be  the  case  for  ordinary  differential  equations. 

In  the  present  paper  we  are  investigating  an  infinite  dimensional  property  (that  of  pos¬ 
sessing  small  solutions)  where  the  analysis  and  results  for  systems  needs  to  be  presented 
quite  separately  from  those  for  scalar  equations  because  there  are  some  interesting  and 
distinctive  features. 

One  way  in  which  delay  equations  may  be  analysed  is  to  view  the  solution  operator 
as  a  dynamical  system.  The  dimension  of  the  dynamical  system  then  inherits  the  infinite 
dimensionality  of  the  delay  equation  itself.  Small  solutions  (those  that  satisfy  x(t)ent  — >  0 
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as  t  — ►  oo  for  all  values  of  the  parameter  a)  can  arise  in  these  infinite  dimensional 
problems  but  would  not  be  observed  in  finite  dimensional  equations.  They  are  important 
because,  when  a  delay  equation  has  small  solutions,  the  eigenfunctions  and  generalised 
eigenfunctions  of  the  solution  map  do  not  form  a  complete  set.  This  means  that  some 
standard  analytical  results  do  not  hold  and  that  particular  care  must  be  taken  in  solving 
and  analysing  the  equation. 

The  easy  detection  of  problems  that  have  small  solutions  is  still,  in  general,  open,  but 
we  have  seen  [4,  5]  that  the  use  of  a  numerical  approximation  scheme  can  lead  to  good 
insights.  Here  we  approximate  the  delay  differential  equation  using  a  simple  numerical 
scheme  with  fixed  step  length  and  then  consider  the  spectrum  of  the  resulting  solution 
map. 

In  recent  work  (see,  for  example  [3,  5])  the  scalar  case  has  been  considered  with 
some  success.  We  have  been  able  to  see  that,  for  the  equation  (1.1)  with  b  periodic  of 
period  1,  we  can  detect  the  existence  of  small  solutions  by  exploring  the  (finitely  many) 
eigenvalues  of  the  numerical  scheme.  We  also  found  that  it  was  not  necessary  to  use  a 
sophisticated  numerical  scheme  for  the  investigation  and  this  has  justified  us  in  focussing 
on  the  trapezium  rule  as  the  numerical  method  in  this  paper. 

For  the  scalar  case  (1.1)  it  is  known  (see  for  example  [4,  5])  that,  when  b  satisfies  the 
periodicity  condition  b(t)  =  b(t  -  1),  then  non-trivial  small  solutions  arise  if  and  only  if 
the  function  b  changes  sign.  For  the  vector-valued  case  we  can  give  a  theorem,  recently 
proved  by  Verduyn  Lunel  ([11]). 

Theorem  1.1  Consider  the  equation 

y'(t)  =  A(t)y(t  -  1),  where  A(t)  =  A(t  -  1),  (1.3) 

and  where  y  6  Mn.  The  equation  has  small  solutions  if  and  only  if  at  least  one  of  the 
eigenvalues  A i  satisfies,  for  some  t, 

5RAi(t-)x^(t+)<0,Ai(f)  =  0.  (1.4) 

Remark  1.2  We  shall  describe  the  property  (1-4)  using  the  words  an  eigenvalue  passes 
through  the  origin .  We  note  that,  even  for  real  matrices  A,  the  eigenvalues  may  be  com¬ 
plex  and  it  could  be  that  a  pair  of  complex  conjugate  eigenvalues  will  cross  the  y~axis 
away  from  the  origin.  In  this  case  the  equation  has  small  solutions  only  if  there  is  some 
other  crossing  of  the  y—  axis  by  an  eigenvalue  where  the  crossing  does  take  place  at  the 
origin. 

2  Numerical  methods  and  systems  of  order  two 

All  the  important  relevant  features  of  systems  of  delay  equations  turn  out  to  be  exhibited 
in  systems  of  two  equations  and  so  we  shall  focus  on  these  for  simplicity.  We  consider 
the  equation 

y'(t)  =  A(t)y{t  -  1)  for  A  e  1R2x2  and  y&B2.  (2.1) 

subject  to  y(t)  =  <p(t)  for  —  1  <  t  <  0  and  we  assume  that  A(t)  =  A(t  —  1)  for  all  t. 
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We  introduce 


=(£$)'  «!))■  *(,) =(£«)•  (2-2) 

We  apply  the  trapezium  rule  with  step  length  h  =  ^  and  introduce  the  approxima¬ 
tions  Xij  «  xi(jh),  and  x2j  «  x2{jh),j  >  0;  Xij  =  ipi(jh),x2j  =  ~N  <  j  <  0. 

Set 

rp 

Vn  —  (  ^l,n5  ,n— 1 1  •  *  •  ?  ^l,n— IV  5  #2,n 5  ^2,n—  1 1  •  •  •  yX2 ,n—N  )  •  (2.3) 

We  note  that,  as  in  the  one-dimensional  case  (see  [3,  4,  5]),  we  can  write  the  numerical 
scheme  as  yn+\  =  A(ri)yni  where  the  matrix  A(n)  now  takes  the  form 


(  1  o 
1  0 

0  1 


0  2^n-fl  2  0 

.  0  0 


0  %/3n+1  1 (3n  \ 

..  ...  0 


0  1  0 

0  27n+l  §7n 

.  0 


0 

The  sequence  of  matrices  {j4(n)}  is  periodic,  of  period  N  (since  the  function  A  is 
periodic  of  period  1)  and  y2  =  A(l)yi,ys  =  j4(2)j4(1)t/i  and  so  on.  Therefore  yN+ i  =  Cy\ 
where  C  =  A(N)A(N .A(2)A(1). 

Remark  2.1  The  key  to  extending  our  discussion  to  larger  systems,  and  indeed,  to 
gaining  a  full  understanding  of  the  approach,  is  to  note  that  in  both  the  matrix  ,4(n)  and 
the  matrix  C  the  original  block  structure  is  retained.  Therefore  although  the  matrices 
A(n)  and  C  are  considerably  larger  than  the  original  2x2  matrix  A(t)  in  the  problem, 
they  are  made  up  of  4  blocks  in  a  2x2  formation.  Indeed  the  contents  of  each  block 
is  completely  determined  by  our  numerical  method  (the  trapezium  rule)  and  the  values 
of  the  corresponding  function,  respectively  a,j3,j,8.  There  is  no  pollution  of  the  blocks 
from  the  neighbouring  functions. 

We  consider  three  different  cases: 

(1)  ) 3(t )  =  7 (t)  =  0  so  that  the  matrix  A  is  diagonal, 

(2)  either  0(t)  =  0  or  7 (t)  =0  so  that  the  matrix  A  is  triangular,  and 


A(n)  = 


Vo 


0  1 


.  0 

0  f<Wl  ¥n 
. 0 


0  J 


•  (2.4) 
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(3)  the  matrix  A  is  neither  diagonal  nor  triangular. 

The  first  two  cases  can  be  dealt  with  quite  quickly  because  of  the'  fact  that  real 
diagonal  and  triangular  matrices  have  only  real  eigenvalues  and  these  eigenvalues  lie  on 
the  diagonal.  Therefore  in  these  two  cases  we  need  consider  only  the  question  of  whether 
the  eigenvalues  pass  through  zero;  we  do  not  need  to  concern  ourselves  with  possible 
complex  eigenvalues  whose  real  parts  change  sign  away  from  the  origin. 

We  can  go  further:  a  diagonal  matrix  A  leads  to  a  block  diagonal  matrix  A(n)  (with 
non-zero  blocks  top  left  and  bottom  right).  Now  by  simple  matrix  theory  we  know 
that  the  eigenvalues  of  such  a  matrix  are  simply  the  union  of  the  eigenvalues  of  the 
two  blocks.  A  similar  argument  applies  when  there  is  a  triangular  matrix  A  because 
the  matrices  A(n)  are  then  block  triangular.  It  follows  that,  for  both  of  cases  1  and  2, 
the  2— dimensional  eigenvalue  problem  simply  reduces  to  two  1— dimensional  problems. 
Therefore,  when  we  consider  the  eigenspectra  of  the  numerical  schemes  in  cases  1  and 
2,  we  expect  the  result  to  be  the  superposition  of  the  eigenspectra  from  the  two  block 
matrices  on  the  diagonal  of  C. 

Case  3  is  more  complicated  and  we  shall  return  to  it  after  we  give  brief  examples  of 
Cases  1  and  2. 

3  How  to  recognise  small  solutions:  our  previous  work 

Space  restrictions  here  prevent  us  from  giving  a  great  many  details  of  our  previous  work, 
but  we  provide  a  summary  to  show  how  the  current  investigation  builds  on  the  scalar 
case.  In  [3]  we  considered  the  eigenspectra  of  the  matrix  C .  We  showed  that  there  were 
three  characteristic  patterns  for  the  eigenspectra,  represented  by  Figure  1.  We  take  the 
presence  of  the  closed  loops  that  cross  the  x-axis  to  be  characteristic  of  the  cases  where 
small  solutions  arise. 


Fig.  1.  Eigenspectra  where  b(t)  has  no  change  of  sign  on  [0, 1]  (left),  where  b(t)  has  a 
change  of  sign  on  [0, 1]  and  f*  b(s)ds  =  0  (centre),  and  where  b(t)  has  a  change  of  sign 
on  [0, 1]  and  //  b(s)ds  ^  0  (right). 


4  The  cases  when  j3{t)  =  0  and/or  7 (t)  =  0 

As  we  have  remarked  already,  the  eigenspectrum  when  A  is  diagonal  or  triangular  is  just 
the  same  as  the  eigenspectra  of  the  block  matrices  from  the  diagonal  of  C.  We  expect  to 
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find  the  eigenspectra  superimposed,  which  is  indeed  what  we  see  in  the  examples  given. 
Here  we  assume  that  at  least  one  of  7 (t)  or  /3(t)  is  zero;  the  plots  are  then  independent 
of  the  values  taken  by  the  other. 

Example  4.1  We  solve  (2.1)  with  the  choice  a(t)  —  sin  27rt-J-1.4  and  S(t)  —  sin  27rt+0.5. 
Here  a  does  not  change  sign  but  6  does  change  sign.  We  expect  small  solutions  and  Figure 
2  provides  confirmation. 

Example  4.2  Now  we  solve  (2.1)  with  <*(£)  =  sin  27 rt  and  S(t)  = 

This  time  both  a  and  S  change  sign  and  we  expect  small  solutions  (see  Figure  2). 


f  -0.3  for  t  6  (0,  5], 
1  0.7  fort  6(5,1]. 


Fig.  2.  Eigenspectra  for  Example  4.1  (left)  and  Example  4.2  (right). 

4.1  The  general  two  dimensional  case 

We  now  move  on  to  consider  the  case  when  neither  of  /3(t),y(t)  is  identically  zero.  In 
this  situation  the  eigenvalues  of  A(t)  can  be  complex  and  so  may  cross  the  y- axis  away 
from  the  origin. 

First,  we  recall  that  det(^4)  is  the  product  of  the  eigenvalues  of  A  so  that,  by  Theorem 
1.1,  it  follows  that  det(A)  =  0  is  a  necessary  condition  for  small  solutions.  However  this 
condition  cannot  be  used  to  characterise  equations  where  small  solutions  arise;  if  the 
eigenvalues  of  A  are  real  and  one  passes  through  the’ origin,  then  det(A)  will  change 
sign.  If  the  eigenvalues  of  A  are  a  complex  conjugate  pair  and  cross  the  y- axis  at  the 
origin  then  det(>l)  will  instantaneously  take  the  value  zero  but  will  otherwise  remain 
positive  (the  same  behaviour  as  when  a  real  eigenvalue  becomes  zero  but  does  not  change 
sign).  Therefore  one  cannot  expect  a  change  of  sign  in  det(^4)  whenever  there  are  small 
solutions.  The  fact  that  the  trace  of  A  is  the  sum  of  the  eigenvalues  of  A  can  be  used  to 
characterise  this  case. 

We  summarise.  For  a  real  matrix  A: 

(1)  if  det(i4)  changes  sign  then  there  are  small  solutions, 

,(2)  if  det(-A)  becomes  zero  instantaneously  and  trace(A)  simultaneously  changes  sign 
then  there  are  small  solutions, 

(3)  if  det(yl)  becomes  zero  instantaneously  and  trace(A)  does  not  simultaneously  change 
sign  then  there  are  no  small  solutions  indicated. 
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Example  4.3  We  first  consider  the  case  when  the  matrix  A  takes  the  form 

A(f\  —  (  s^n  +  a  sin  2n t  +  b  \ 

^  ^  sin27rt  +  c  sin27rt  +  d  j 

By  judicious  choice  of  the  constants  a,  b ,  c,  d  one  can  produce  different  types  of  be¬ 
haviour.  One  can  see  that  |A(£)|  =  (a  +  d  -  b  -  c)  sin27r t  +  (ad  -  be).  We  will  illustrate 

with  the  following  choices  of  the  constants 

Case  Is  a  =  1.5, 6  —  0.7,c  =  0.5,d  =  0.5  where  the  determinant  changes  sign, 

Case  2:  a  =  —2,  b  —  0.8,  c  =  1.8,  d  =  0.7  where,  again,  the  determinant  changes  sign, 

Case  3:  a  —  1.6,  b  =  0.8,  c  =  1.8,  d  =  0.7  where  the  determinant  never  becomes  zero. 

From  the  plots  for  cases  1  and  2,  we  can  easily  see  the  presence  of  small  solutions  in 

the  eigenspectra  shown  in  Figure  3.  In  the  Case  3,  the  eigenspectra  in  Figure  3  indicate 
that,  as  expected,  no  small  solutions  are  present. 


Fig.  3.  Case  1.  Case  2.  Case  3 


Example  4.4  Next,  we  consider  the  case  when  the  matrix  A  takes  the  form 

sin  2irt  -  (sin  2tt t  +  b) 
sin  27rt  +  6  sin  2n  t 

We  choose  the  constant  b  in  the  following  ways 

Case  4:  b  =  0  so  that  det(A)  becomes  instantaneously  zero  at  the  same  value  that 
trace(A)  changes  sign  and  the  complex  eigenvalues  of  A  cross  the  y-axis  at  the 
origin, 

Case  5:  b  —  0.05  so  that  the  complex  eigenvalues  of  A  cross  the  ?/-axis  away  from  the 
origin. 

Here  we  can  see  that  the  characteristic  shapes  we  familiar  from  our  earlier  work  are 
not  reproduced  and  further  investigation  is  called  for.  We  remark  that  (in  the  zoomed 
versions)  the  eigenspectrum  where  small  solutions  arise  passes  through  the  origin.  This 
property  is  reproduced  also  for  all  other  examples  that  we  have  tried. 

Example  4.5  Now  we  consider  the  case  when  the  matrix  A  takes  the  form 


N.  J.  Ford  and  P.  Lumb 


Fig.  4.  Left:  Case  4.  Right:  Case  5  and  (below)  zoomed  versions. 

for  t  e  [~-0.5,0.5),i4(£)  =  A(t  -  1)  for  t  >  0.5  then  it  follows  that  A  has  complex 
eigenvalues  that  cross  the  y— axis  at  y  =  b  when  t  =  0.  We  plot  the  eigenspectra  for 

Case  6:  b  =  0  so  the  eigenvalues  of  A  cross  the  ?/-axis  at  the  origin, 

Case  7:  b  =  0.01  so  the  eigenvalues  of  A  cross  the  y-axis  away  from  the  origin. 


Fig.  5.  Left:  Case  6.  Right:  Case  7  and  (below)  zoomed  versions 
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5  Conclusions 

We  have  seen  that  it  is  easy  to  extend  the  detection  of  small  solutions  by  numerical 
methods  from  one-dimensional  to  two-dimensional  problems  where  the  eigenvalues  are 
real.  Initial  experiments  indicate  that  the  method  works  also  for  problems  possessing 
complex  eigenvalues,  but  here  the  patterns  that  arise  in  the  eigenspecra  plots  are  unfa¬ 
miliar  and  require  further  investigation.  However,  based  on  our  experimental  evidence, 
it  seems  that  small  solutions  arise  in  the  latter  case  if  and  only  if  the  eigenspectra  plots 
pass  through  the  origin. 
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Abstract 

In  this  paper,  we  present  a  new  technique  for  generating  error  equidistributing  meshes 
that  satisfy  both  local  quasi-uniformity  and  a  preset  minimal  mesh  spacing.  This  is  firstly 
done  in  the  one-dimensional  case  by  extending  the  Kautsky  and  Nichols  method  [6]  and 
then  in  the  two-dimensional  case  by  generalizing  the  tensor  product  methods  to  altern¬ 
ating  curved  line  equidistributions.  With  the  new  meshing  approach,  we  have  achieved 
better  accuracy  in  approximation  using  interpolator  radial  basis  functions  (RBFs).  Fur¬ 
thermore  improved  accuracy  in  numerical  results  have  been  obtained  for  a  class  of  linear 
and  non-homogeneous  PDEs  solved  by  the  dual  reciprocity  method  (DRM). 


1  Introduction 

The  adaptive  mesh  algorithms  have  been  widely  used  in  the  numerical  solution  of  par¬ 
tial  differential  equations  (PDEs)  for  boundary  value  problems  [1,  13].  One  undesirable 
feature  of  an  error  equidistributing  mesh  is  that  there  is  no  guarantee  of  it  being  suffi¬ 
ciently  smooth.  For  our  applications  of  interpolation  (using  RBFs),  the  distance  between 
points  becoming  too  small  can  imply  that  the  underlying  interpolation  matrix  becomes 
ill-conditioned. 

In  this  paper,  we  propose  a  method  to  deal  with  this  problem  in  Section  2.  Essentially 
our  method  consists  of  modifying  the  error  monitor  function  in  a  suitable  way  and 
then  equidistributing  the  new  function  so  that  the  minimal  mesh  size  constraint  can  be 
satisfied.  We  deal  with  the  extension  of  adaptive  mesh  to  two  dimensions  in  Section  3. 
Finally,  some  numerical  results  will  be  given  in  Section  4. 

2  An  adaptive  mesh  with  minimal  mesh  size  control 

In  the  ID  case,  a  typical  adaptive  mesh  problem  can  be  stated  as  follows:  given  a  mesh 
(uniform  or  non-uniform)  .  .,tm,  and  its  corresponding  error  values  (usually  es¬ 

timated  from  the  numerical  solution  using  a  monitor  function  [5])  /o,  /i,  • . /m>  we  wish 
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to  find  a  new  mesh 

II :  ar0,  Xu  s„,  (2.1) 

that  is  locally  bounded  with  respect  to  a  positive  constant  k  >  1  such  that  1/k  < 
hj/hj-i  <  h,-j  =  1, 2, . .  .,n  —  1,  hj  =  Xj+i  —  Xj ,  while  the  errors  are  equidistributed  on 
mesh  n.  One  solution  to  this  problem  was  given  in  [6]  by  replacing  fj  by  fj  followed  by  a 
standard  equidistribution  algorithm,  fj  is  referred  to  as  the  padded  function  and  the  main 
idea  of  replacing  fj  is  increasing  the  values  of  the  function  /,  where  too  small,  to  prevent 
considerably  large  mesh  sizes.  We  now  propose  a  method  of  further  modifying  fj  in  such 
a  way  that  the  resulting  equidistribution  mesh  satisfies  the  preset  minimal  mesh  size 
hmin.  Before  proceeding,  we  consider  replacing  the  piecewise  linear  function  f(x)  (with 
endpoint  values  fj  =  f(tj))  by  another  piecewise  linear  function  Z(x)  (with  endpoint 
values  Zj  =  f(xj)).  This  is  a  technical  approximation  to  simplify  the  presentation; 
actually  the  proposed  method  may  work  without  this  step.  Note  that  if  we  were  to 
equidistribute  Z(x ),  the  resulting  mesh  would  not  differ  from  Xj  much;  define  the  average 
value  of  the  monitor  function  as 


d'  -  d\Z)  -  -  +  zi+ i)“T- 


Our  aim  now  is  to  modify  some  Zj  values  so  that  the  modified  average  value  is  the  same 
as  df  while  the  modified  values  ensure  a  preset  minimal  mesh  size  hmin  is  satisfied.  To 
present  our  method,  we  note  that  insisting  on  hj  >  hmin  implies  Zj  <  Z  where 

Zhmin  -df  (2.3) 

and  Z  is  the  critical  constant  to  realize  hmin.  This  points  a  way  of  modifying  those  large 
values  of  Zj.  However  it  is  not  obvious  how  to  ensure  the  new  and  modified  average 
values  are  the  same,  i.e.  equidistribution  is  maintained  for  the  same  error  constant. 
Suppose  that  among  the  current  Zj  values,  there  are  M  + 1  of  them  that  are  larger  than 
Z  (i.e.  whose  corresponding  mesh  size  is  less  than  /imin);  denote  these  values  by  Zkj  for 
j  =  0, 1, . . . ,  M.  This  means  that  Zk.  <  Z  for  j  =  M  + 1,  M  -f  2, . . . ,  n.  Here  the  sequence 
&o,  fci, . . .,  kn  represents  a  permutation  of  0, 1,2, . . . ,  n. 

It  turns  out  that  a  suitable  modification  (from  Zj  to  Zj)  is  the  following: 


Zk,  =  z 


Zkj  >  z , 


0, 1, . . ., M, 


—  zk.  + 


E  z* 


J^(Zki  -  Z)hki 


for  j  —  M  +  1,M  +  2, . .  .,n, 


where 


(hki  +  hki- 1)/2  when  ki  ^  0,  n, 
ho/2  when  ki  =  0, 

hn-  x/2  when  ki  =  n. 
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For  a  simple  illustration,  see  the  plot  of  Fig  3b.  To  prove  that  the  above  modification  is 
suitable,  we  first  present  the  following  result  for  a  simple  case. 

Theorem  2.1  Let  xq,xi,.  ,  .,xn  be  a  non-uniform  mesh  with  the  mesh  sizes  hj  = 
Xj+ 1  —  Xj  and  Zo ,  Z\, . . Zn  are  the  corresponding  error  values.  If  the  critical  constant 
value  Z  as  in  (2.3),  and  only  one  value  Z\>  Z  (i.e.  M  =  1  and  all  others  Zj  are  less 
than  or  equal  to  Z),  the  modification  (2.4)  takes  the  following  form, 


(i)  Z0  =  Zo , 

(a)  Zj  =  Zj 


Zi  =  z, 


[( Z\  —  Z)(ho  +  h\)/2  /{hj  +  hj~ i )/2  for  j  —  2, 3, . .  ,,n. 


Then  the  average  value  d  =  d(Z)  of  the  modified  values  Zj  is  the  same  as  df  —  d!{Z)  in 

(2.2). 

Note  M  =  1  here;  in  fact  the  results  holds  for  any  one  value  Zj  >  Z.  Now  we  are  ready 
to  present  the  main  result  on  equation  (2.4)  with  regard  to  minimal  mesh  size  control. 

Theorem  2.2  With  the  error  function  modified  as  in  (2.4),  the  new  mesh  hj  resulting 
from  equidistribution  satisfies  (i)  the  average  error  value  remains  as  df ;  (ii)  hj  >  hmin. 
Here  hmin  cannot  be  specified  to  be  larger  than  h  =  1  jn  (the  uniform  mesh  size); 
practically  we  found  hmin  €  [ h 2 ,  h/2]  is  adequate.  Full  proofs  to  these  results  will  be 
given  in  the  full  version  of  this  paper  [10]. 

In  the  method  in  (2.4),  the  values  of  Zk]  which  are  less  than  but  close  to  Z  may 
become  unnecessarily  larger  (e.g.  larger  than  Z)  and  therefore  we  can  propose  a  further 
refinement.  We  can  keep  some  of  the  Zkj  values  which  are  between  Zj 2  and  Z.  In  other 
words,  we  only  modify  the  very  large  and  very  small  values  of  Zk.  (see  plot  of  Fig  3b). 
Then  our  theorems  are  still  valid  but  the  proofs  may  need  minor  changes.  Finally  we 
summarise  our  adaptive  method  with  minimal  mesh  size  control  as  follows  (see  the  plot 
of  Fig  3b  for  an  illustration). 

Algorithm  2.3.  (Numerical  algorithm)  For  given  non-uniform  mesh  a  =  to,t\ , . . . , 
tm  =  b,  the  error  values  /o,  /i , . . .,  /m?  values  c  and  hmin: 

(1)  Does  the  locally  bounded  mesh  algorithm  converge  to  the  new  m,esh  a  =  x o  <  < 

. . .  <  xn  =  b  which  is  sub-equidistributing  with  respect  to  c  and  f,  that  is.  for  a 
sufficiently  large  value  of  the  integer  n  such  that  J ^  f  <  nc,  and  the  inequalities 

rxj+ 1 

f  <c,  j  =  0,  l,...,n~  1 


are  satisfied. 

(2)  Check  the  minimal  mesh  size  and  compare  it  with  the  hmjn.  If  it  is  less  than  hrnin, 
go  to  the  Step  3  otherwise  stop. 

(3)  Approximate  the  padding  values  Zj  =  f  (xj )  corresponding  to  the  new  mesh  by  using 
piecewise  linear  interpolation  of  fj  values  and  calculate  the  average  value 


iy>J  + Vi)t> 


where  hj  =  Xj+\  —  Xj, 
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and  Z  according  to  Zh^nin  —  d • 

(4)  Obtain  the  decreasing  arrangement  of  Zj ,  Zkj  by  ordering  them . 

(5)  Modify  the  Zkj  values  as  follows, 


(i)  Z^  =  Z  when  Zkj  >  Z, 

assuming,  that  for  j  =  0, 1, . . . ,  M  Zkj  >  Z, 

(ii)  Zkj  =  Zkj  when  Zj2  <  Zk.  <  Z, 


assuming,  that  for  j  =  M  +  I,  M  +  2, . . . ,  N,  Z/2  <  Zkj 

4  =  4  +  £r.^1z>,  [E"o(4  -  Z)hk]  /hkj , 
for  j  -  iV  +  l,A/P  +  2,...'n, 


<^5 


w/iere  /ifc.  was  introduced  in  (2.5). 

(6)  Check  the  modified  values  Zkj  in  the  stage  (Hi)  of  the  Step  5.  If  Zkj  <  Z/2  for  all 
j,  go  to  Step  7  otherwise  repeat  Step  5. 

(7)  Perform  the  equidistribution  procedure  for  the  modified  values  Zkj  and  obtain  the 
new  adapting  mesh. 


3  Extension  to  two  dimensions 

The  concept  of  adapting  mesh  in  one  dimension  is  well  known  (see  e.g.  [5,  3]).  Extension 
of  this  idea  to  two  dimensions  is  not  straightforward.  For  a  given  function  f(x ,  y)  and 
2D  domain  Q,  an  obvious  extension  is  dividing  the  domain  into  some  subdomains  f ^ 
in  such  a  way  that 


=  constant. 


(3.1) 


Fig.  1 .  In  Fig  (a)  the  monitor  values  corresponding  to  the  new  mesh  are  represented  by 
the  linear  interpolation  for  these  values  is  shown  by  4.’  and  in  Fig  (b)  the  modified 
values  of  the  padded  function,  represented  by  dash  line,  are  compared  with  the  original 
values. 
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Fig.  2.  In  Fig  (a)  equidistribution  of  slabs  in  the  two  coordinate  direction  and  in  Fig 
(b)  three  stages  of  the  new  method  are  shown. 

But,  such  a  partition  is  not  unique  and  furthermore  satisfying  condition  (3.1)  properly 
is  not  simple.  Consequently,  this  condition  has  to  be  replaced.  Among  the  methods 
given  to  satisfy  the  condition  (3.1)  as  much  as  possible,  two  well  known  methods  are 
transformation  and  dimension  reduction.  Transformation  methods  are  based  on  mapping 
the  physical  domain  into  a  simple  domain  with  a  uniform  mesh  and  ultimately  applying 
the  equidistribution  condition  to  obtain  an  adapting  mesh  in  the  physical  domain  [4,  12]. 
These  methods  are  generally  costly  and  complicated  in  theory.  In  this  work  we  first 
consider  the  latter  method  which  is  easier  and  cheaper  than  the  former  method.  We 
then  present  a  new  technique  to  generate  a  2D  mesh. 

3.1  Dimensions  reduction 

We  assume  that  Q  is  a  rectangle  in  the  form  Q  =  {(£,$/),  a  <  x  <  b,  c  <  y  <  d).  k 
simple  idea  is  to  produce  the  mesh, 

a  —  x o  <  xi  <  . . .  <  xn-i  <  xn  =  b, 


C .=  2/0  <  Vi  <  •  •  •  <  2/m— l  <  2/m  =  d, 


such  that 


and 


fx(x ,  y)  dy  dx  =  constant, 


(x,  y)  dx  dy  =  constant, 


(3.2) 

(3.3) 


where  fx(x,y)  and  fy{x,y)  are  the  monitors  in  the  x  and  y  directions  respectively  (see 
Fig  3.1a).  Obviously  the  generated  mesh  by  this  method  is  much  different  from  an 
equi-distributing  mesh  that  one  expects  from  (3.1).  Another  method  which  leads  to  a 
non-rect angular  grid  is  dimensional  splitting  [11].  We  now  describe  a  new  method  of 
type  dimension  reduction. 
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Fig.  3.  In  Fig  (a)  the  mesh  generated  by  the  new  method  for  function  in  (3.6)  and  in 
Fig  (b)  the  resulting  mesh  when  restricting  the  minimal  mesh  size  as  hmin  =  h/2  for 
the  same  function  are  shown. 


3.2  A  new  approach  for  a  2D  mesh 

The  idea  is  based  on  the  tensor  product  method  and  therefore  a  non-rectangular  grid. 
We  start  with  a  uniform  mesh  in  a  rectangular  region  Cl  and  perform  the  method  in 
three  stages.  In  the  first  stage,  the  error  equidistributing  is  performed  for  each  line  in 
the  horizontal  direction  (see  the  first  part  of  Fig  3.1b),  that  is, 


fx(x ,  Vi)  dx  =  constant  for  i  =  0, 1, . 


m. 


(3.4) 


In  the  next  stage,  the  mesh  is  redistributed  in  the  vertical  direction  along  the  new  grid 
lines  (see  the  second  part  of  Fig  3.1b),  that  is, 


l 


s»+l 


fy(xj,y)  dy  —  constant  for  j  =  0,1. 


n, 


(3.5) 


where  s^+i  —  s*  is  the  distance  between  two  consecutive  points  ( Xj,yi )  and  (xj,yi+i) 
along  the  new  lines.  In  the  final  stage,  equidistributing  is  repeated  in  the  horizontal 
direction  along  the  grid  lines  (the  last  part  of  Fig  3.1b).  One  can  observe  that  repeating 
this  procedure  usually  leads  to  a  convergent  mesh.  According  to  our  experiments,  the 
number  of  iterations  to  achieve  convergence  is  at  most  five.  The  resulting  mesh  by  this 
procedure  for  function 

u(x,y)  =  e(4~x2-4y2)  (3.6) 


when  applying  the  arc-length  monitor  is  shown  in  Figure  3a.  The  idea  of  controlling 
the  mesh  size  can  also  be  applied  in  this  technique.  The  generated  mesh  for  the  same 
function  when  the  mesh  sizes  are  restricted  to  hmin  =  h/2 ,  where  h  is  the  mesh  size  in 
the  case  of  uniform  mesh,  is  given  in  Figure  3b. 

4  Numerical  examples 

In  this  part  the  affect  of  adapting  the  mesh  on  the  accuracy  of  interpolation  and  the  DRM 
is  considered.  In  the  following  examples,  the  infinity  norm  has  been  used  to  measure  the 
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Fig.  4.  The  resulting  mesh  when  using  the  new  method  for  function  in  Examples  1  and 
2  are  shown  in  Figures  (a)  and  (b)  respectively. 


Method 

stage 

Function  (El) 

Derivative 

Function  (E2) 

Derivative 

uniform  mesh 

— 

5.1E-2 

9.5E-1 

1.3E-2 

2.2E-1 

Adaptive  mesh 

first 

5.4E-3 

1.6E-1 

2.5E-3 

1.3E-2 

with  control 

second 

5.4E-3 

3.0E-1 

2.1E-3 

1.0E-1 

third 

3.8E-3 

3.0E-1 

3.7E-3 

1.0E-1 

Adaptive  mesh 

first 

1.4E-2 

9.9E-2 

2.5E-3 

1.5-2 

without  control 

second 

2.2E-2 

7.5E-1 

2.1E-3 

1.0E-1 

third 

1.8E-2 

6.0E-1 

4.5E-3 

1.2E-1 

Tab.  1.  The  interpolation  error  for  Examples  1-2  using  adaptive  mesh  with  and 
without  control  the  mesh  sizes. 


accuracy,  that  is,  if  u  and  u  are  the  exact  and  approximate  values  respectively  then  the 
error  is  calculated  as 

eu  =  ||u(:r)  -  w(a;)||oo  =  max  \u(x)  -  u(x)\. 

x£D 

A  polynomial  RBF,  1  +  r3,  has  been  employed  in  this  work. 

Example  4.1  We  check  the  interpolation  in  terms  of  the  RBFs  for  the  function, 

u(x,  y)  =  (l  —  e3x“3)  sin(1.5  tt  y),  (4.1) 

in  a  rectangular  domain.  The  generated  mesh  for  this  function  is  shown  in  Figure  4a. 

Table  4  shows  the  affect  of  adapting  mesh  on  the  interpolation  accuracy  with  and  without 
controlling  the  mesh  sizes.  As  one  can  observe,  using  the  adapting  mesh  considerably 
improves  the  accuracy  in  comparison  with  the  case  of  uniform  mesh.  Moreover,  the  result 
in  the  case  of  controlling  the  minimal  mesh  size  is  better. 


Adaptive  mesh  algorithm  with  distance  control 

Example  4.2  In  this  example  we  first  check  the  function  f2(x,  y)  —  0.5—0. 5  tanh(-4+ 
16  z2  +  16  y2)  and  then  solve  the  linear  PDE:  V2^  +  +  x7fjj  +  XVU  ~  with  the 

Dirichlet  boundary  condition  over  the  elliptic  domain  x 2  +  4 y2  =  4,  where  d  is  a  known 
function  such  that  the  exact  solution  is  u{x,y)  =  ?/). 

Again  from  Table  4,  we  see  improved  approximation.  We  apply  the  DRM  method  [7]  for 
solution,  where  the  domain  integrals  are  approximated  by  using  RBF  interpolation.  The 
adaptive  mesh  for  this  function  is  given  in  Fig.  4b  and  has  been  observed  to  give  rise  to 
improved  DRM  solution. 

5  Conclusions 

We  considered  a  new  algorithm  for  producing  a  locally  bounded  mesh  with  a  preset 
minimal  mesh  size.  Such  a  mesh  is  used  to  overcome  the  ill-conditioning  problems  asso¬ 
ciated  with  radial  basis  function  interpolation.  Extension  of  the  idea  to  the  2D  case  is 
also  considered.  Some  preliminary  and  improved  numerical  results  are  given. 
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Abstract 

At  present  the  use  of  hypercomplex  methods  is  pursued  by  a  growing  number  of  mathem¬ 
aticians,  physicists  and  engineers.  Quaternionic  and  Clifford  calculus  will  be  applied  on 
wide  classes  of  problems  in  very  different  fields  of  science.  We  explain  Maxwell  equations 
within  the  geometric  algebras  of  real  and  complex  quaternions.  The  connection  between 
Maxwell  equations  and  the  Dirac  equation  will  be  elaborated.  Using  the  Teodorescu 
transform  we  will  deduce  an  iteration  procedure  for  solving  weak  time-dependent  Maxwell 
equations  in  isotropic  homogeneous  media.  Assuming  the  so-called  Drude-Born-Feodorov 
constitutive  laws  Maxwell  equations  in  chiral  media  were  deduced.  Full  time-dependent 
problems  will  be  reduced  to  the  consideration  of  Weyl  operators. 

1  Historical  oriented  introduction 

Classical  Maxwell  equations  were  discovered  in  the  second  half  of  the  nineteenth  century 
as  result  of  the  stormy  development  of  electromagnetic  research  in  that  time.  The  study 
of  these  equations  has  attracted  generations  of  physicists  and  mathematicians  but  some 
of  their  secrets  are  still  hidden. 

At  about  the  same  time,  also  new  algebraic  structures  were  invented.  W.R.  Hamilton 
discovered  in  1843  the  algebra  of  real  quaternions  as  a  generalization  of  the  field  of 
complex  numbers.  Under  the  influence  of  H.  Grassman’s  extension  theory  and  Hamilton’s 
quaternions,  W.K.  Clifford  created  in  1978  a  geometric  algebra,  which  is  nowadays  called 
Clifford  algebra.  Its  construction  starts  with  a  basis  in  the  signed  Rn  =  Rp,q  with  units 
ei, ...,  en.  Assume  that  ef  =  —1,  for  i  =  1,  and  ej  =  1,  for  j  =  1,  ...,p,  as  well  as  the 
anticommutator  relation 

6iCj  T"  0 

for  i j,  Together  with  eo  =  1  one  can  construct  a  basis  in  the  2” —dimensional  standard 
Clifford  algebra  Clp,q.  Incidentally,  in  1954  C.  Chevalley  [5]  showed  that  each  Clifford 
number,  i.e.  each  element  of  Clp,q,  can  be  identified  with  an  antisymmetric  tensor. 

Let  us  go  back  to  the  electromagnetic  field  equations.  Already  J.  C.  Maxwell  [15] 
himself  and  W.  R.  Hamilton  [10]  used  these  new  algebraic  techniques  to  try  to  simplify 
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Maxwell’s  equations.  The  aim  was  to  obtain  an  equation  of  the  type 

Du  +  au  —  F 

with  suitable  operators  D  and  a.  For  this  reason  Hamilton  introduced  his  “N’abla  oper¬ 
ator”  as  well  as  the  notion  “vector” .  The  tendency  of  algebraisation  of  physics  continued 
in  the  first  half  of  the  last  century.  A  long  list  of  important  publications  were  devoted 
to  this  topic.  We  only  stress  here  some  of  the  milestones,  beginning  with  the  “Theory 
of  Relativity”  by  L.  Silberstein  (1914)  [18]  ,  and  H.  Weyl’s  book  “Raum-Zeit-Materie”  of 
1921.  Important  results  of  Einstein/Mayer,  Lanczos  and  Proca  followed.  In  1935  this  de¬ 
velopment  highlighted  with  the  thesis  of  M.  Mercier  (Geneva)  [16].  After  the  reinvention 
of  the  concept  of  “spinors”,  firstly  appeared  in  1911  in  a  paper  by  E.  Cartan,  D.  Hestenes 
[11,  12,  13],  F.  Bolinder  [3]  and  M.  Riesz  [17]  wrote  fundamental  algebra  papers  with 
applications  in  electromagnetic  theory,  using  the  framework  of  Clifford  numbers  and 
spinor  spaces. 

Meanwhile,  in  the  late  thirties  the  famous  Swiss  mathematician  R.  Fueter  and  his  co¬ 
workers  and  followers  used  a  function-theoretic  approach  for  the  same  problems.  These 
ideas  were  refreshed  and  fruitful  extented  by  R.  Delanghe  and  his  group  and  A.  Sudbury 
in  the  seventies  and  early  eighties  (cf.  [4,  20]).  Influenced  by  the  success  of  complex 
analysis  and  Vekua  theory  a  generalized  operator  theory  with  corresponding  singular 
integral  operators  [19]  and  a  corresponding  hypercomplex  theory  for  boundary  value 
problems  of  elliptic  partial  differential  equations  were  developed  [8], [9]. 

Making  use  of  a  transformation  of  Maxwell’s  equations  into  a  system  of  homogeneous 
coordinates  we  will  propose  an  alternative  solution  method. 

2  Maxwell  equations 

Let  G  be  a  bounded  domain  with  sufficient  smooth  boundary  F  that  is  filled  Out  with 
an  isotropic  homogeneous  material. 

Using  Gauss  units  Maxwell  equations  read  as  follows: 


c  rot  H 

=  4tt  JAdtD 

(Biot-Savart-Ampere’s  law) 

c  rot  E 

=  ~dtB 

(Faraday’s  law) 

div  D 

—  An  p 

(Coulomb’s  law) 

div  B 

=  0 

(no  free  magnetic  charge) 

Furthermore,  the  continuity  condition  has  to  be  fulfilled: 

div  J  =  ~dtp, 

where  E  —  E(t,x)  is  the  electric  field,  H  =  H(t,x)  the  magnetic  field,  J  =  J(t,x)  the 
electric  current  density,  D  —  D(t,x)  the  electric  flux  density,  B  =  B(t,x)  the  magnetic 
flux  density,  p  —  p(t ,  x)  the  charge  density,  and  c  is  the  speed  of  light  in  a  vacuum. 

The  relations  between  flux  densities  and  the  electric  and  magnetic  fields  depend  on 
the  material.  It  is  well-known  that  for  instance  all  organic  materials  contain  carbon  and 
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realize  in  this  way  some  kind  of  optical  activity.  Therefore,  Lord  Kelvin  introduced  the 
notion  of  the  chirality  measure  of  a  medium.  This  coefficient  expresses  the  optical  activity 
of  the  underlying  material.  The  correspondent  constitutive  laws  are  the  following: 

D  =  e  E  +  e  /3  rot  E  (Drude-Born-Feodorov  laws), 

B  =  p  H  +  p  ft  rot  H, 

where  e  =  e(t,x)  is  the  electric  permittivity,  p  =  p{t,x)  is  the  magnetic  permeab¬ 
ility  and  the  coefficient  ft  describes  the  chirality  measure  of  the  material.  In  isotropic 
cases  one  has  the  possibility  to  use  the  so-called  Tellegen  representation 

D  =  e  E  +  a  H , 

B  =  pH  +  a*E. 

The  connection  between  the  electric  field  E  and  current  density  J  is  given  by 

J  =  a  E  +  cr  g 

where  a  is  the  electric  conductivity  and  g  a  given  electric  source. 

Starting  with  ft  =  0  and  replacing  D  and  B  by  D  =  e  E  and  B  =  p  H  we  get  in  the 
case  of 

£  =  S(x)  ,  p  =  p(x) 


—edtE  -f  c  rot  H  =  47 rJ,  (2.1) 

pdtH  +  c  rot  E  =  0  ,  (2.2) 

edivE  =  4? rp-(Ve-E),  (2.3) 

p  div  H  =  -(Vp-H).  (2.4) 

After  summing  (2.1)  and  (2.4)  as  well  as  (2.2)  and  (2.3)  we  obtain 

-edtE  +  c  rot  H  +  p  div  H  =  -(Vp  -  H)  - f  4tt  J,  (2.5) 

pdtH  +  c  rot  E  +  e  div  E  —  —(Vs  •  H)  +  47r  p.  (2.6) 


In  the  case  of  e,  p  being  constants  we  can  introduce  the  new  functions  E,  H  which 
are  defined  on  a  homogeneous  space  with  a  first  coordinate  .To  and  the  other  coordinates 
x  —  (ti,  T2, t3).  We  obtain: 

E(t,  x)  =:  E  ^  t,  i  cj  , 

h:,.„  =«(y* 

The  equations  (2. 5)-(2. 6)  transform  into 

d\E  4-  rot  H  +  p  c  div  H  =  Att  J, 

H  +  rot  E  -f  e  c  div  E  =  47r  p. 
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3  Quaternionic  representations 

Let  ei,e2,e3  be  the  generating  units  of  the  algebra  of  real  quaternions  H,  which  fulfil 
the  conditions 

e%e^  4*  (fij  1)2)3). 

This  leads  to  the  following  multiplication  rule  for  two  quaternions  u  =  Uo+u  ,  v  =  vq+v: 

uv  —  uovo  —  u  •  v  +  uov  +  vo u  +  u  x  v  (vi  e  Ft), 

where  u  =  u\e\  +  u2e2  +  1*363  ,  v  =  v\e\  +  v2e2  +  ^3e3-  Further,  let  u  =  Uo  +  u  be  a 
quaternion.  Then  u  =  u0  -  u  is  called  to  be  its  conjugate  quaternion.  The  operator 
defined  by 

D  =  d\6\  +  $2^2  +  $363 

is  called  Dirac  operator.  It  acts  on  a  quaternionic  valued  function  as  follows: 

Du  =  —  div  u  +  rot  u  +  grad  Uo  . 

With  the  multiplication  operator 

ttiqu  —  Ouq  +  u  (0  6  IR+), 
with  u  —  Wo  +  M)  w  —  Wiei  +  1*2^2  +  1*363,  we  obtain 

m^diE  +  DH)  =  Air  J  , 
m£C(diH  +  DE)  =  47 r  p  , 

and  so 

+  =  rn^c  4tt  J  , 

c^LT  +  DE  =  m~^Aw  p. 

Finally,  we  get 

0(15  +  ff)  =  di  (£  +  i?)  +  Z>(i5  +  F)  =  4f(m-‘  J  +  m^p)  =:  , 

0(i5  —  H)  =  di(E  —  H)  —  D(E  —  H)  =  47r(m~<?  J  —  rn~^p)  =:  F2  , 

where  5  is  also  called  Weyl  operator  and  5  is  the  conjugate  to  d.  By  the  way,  a  function 
u  is  called  quaternionic  regular  if  d  u  =  0  and  quaternionic  anti-regular  if  d  u  =  0. 

For  simplifying  we  set:  E  +  H  =:  v  and  E  —  H  ~:w.  Then  it  follows 

dw  =  Fi(v,w),  (3.1) 

dw  =  F2(v,w).  (3.2) 

Let  us  have  a  closer  look  at  the  functions  F\,F2,  The  electric  current  density  J  is  given 
by 

J  =  cr  E  +  cr  g, 
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where  E  and  g  are  vector  functions.  This  leads  to  the  following  simplification 
Fi  =  47t  \a(E  +  £?)  +  —]  =  27r  fcr(v  +  w)  +  og  +  — 1  , 

i*  SC  J  L  SC  J 

F2  =  47r  |a(J£  +  <7<?)  -  =  271-  [a(t>  +  w)  +  g  -  —  j  . 

Hence 

F2  =  -Fi  . 

Thus 

=  Fi(v,w ),  (3.3) 

dw  =  —Fi(v,w).  (3.4) 


4  Integral  representation 

Let  G  be  a  bounded  domain  in  IR3  and  a  a  positive  constant.  We  consider  in  R4  the  cyl¬ 
inder  Z  =  G  x[— a,  a\.  A  right  inverse  to  the  Weyl  operator  is  the  following  Teodorescu 
transform: 

(Tzu)(x)  =  —  f  e(x  -  y)u(y)dy  ,  Z  =  G  x  [- a ,  a] 
cr3  J 
z 

with  e(x)  =  x/\x\A>  cr3  =  27r3/2/r(3/2).  We  obtain  in  a  straightforward  manner 


dTz«=  j  J 

and 

Iz  d  w  +  4>z  =  |  q 

with  €  ker  d.  In  complete  analogy  a  conjugate  Teodorescu  transform  is  introduced. 
We  just  have  to  replace  e(;r)  by  its  conjugate.  Now  it  follows  from  (7)-(8)  that 

v  =  TzFi{v ,  w)  +  <j>z  ( d(j)z  =0), 

w  =  T*zF2(v,w)  +  <j>*z  (50J=O). 

Furthermore  we  have  to  introduce  Cauchy-Bizadse-type  operators,  which  are  defined 
by  the  boundary  data.  These  operators  read  as  follows: 

(Fgzu)(x)  =  —  f  e(x  -  y)n{y)u{y)d{dZ)y  (x  <£  dZ) 

&3  J 


{Fezu)(x)  :=  —  /  e(x  -  y)n(y)u{y)d(dZ)v,  (x  £  dZ) 
^3  J 
dz 


in  Z, 

in  R4  \  Z , 


in  Z, 

in  R  4  \  Z, 


where  n(y)  =  (no  +  n)(y)  denotes  the  unit  vector  of  the  outer  normal  on  dZ  at  the  point 
V • 


An  alternative  approach  for  solving  Maxwell  equations 
It  can  be  proved  that 

<j>z  =  Fdzw  and  &Z  =  FqZv  in  Z  . 

It  should  be  noted  that  we  do  not  need  the  whole  trace  of  the  functions  w  and  v 
on  the  boundary.  We  just  have  to  consider  these  parts  of  trzv  (  trzw)  which  are  lying 
in  the  corresponding  Hardy  space  of  functions,  which  permit  a  quaternionic  regular 
(quaternionic  anti-regular)  extension  into  Z,  accordingly.  We  get  the  integral  equations 

v  =  47ra  Tz(v  +  w)  +  47r  Tz(crg  +  —;)  +  h,  (4.1) 


47T<T  Tz{y  +  w)  +  47T  Tz{(jg  -  — )  -j-  h*, 


where 


h  =  Fdztrezv  and  h*  =  FQZtrezw. 

If  h,  h*  are  known  then  under  smallness  conditions  the  iteration  procedure: 
vn  =  AttctTz  (vn—i  +  wn- 1)  A  4ttTz  (a g  +  — )  +  h, 

EC 

=  47raT|(i;n_i  +  wn_i)  +  47rT|(ap  -  — )  +  ft*, 

EC 

with  (^q  =  tuo  =  0)  will  converge  in  suitable  Banach  spaces. 

Remark  4.1  In  [1]  is  proved  the  following  estimation: 

•  \\TzUl^C)<  ~^a\G\. 

5  Weak  time  dependent  Maxwell-equations 

Assume  now  e  =  e(x),  p  —  p(x),  n  =  k(x)  ( g  =  0)  and 

E(t,x)  =  Eo(t)E(x)  and  H(t,x)  =  Ho(t)Hi(x), 
where  the  scalar  functions  Eq  and  Hq  are  known.  Maxwell  equations  then  transform  to 


c  Eq  rot  Ei 

=  —8t(pH(j)H1  , 

(5.1) 

c  H0  rot  Hi 

=  (dt(eE0)  +  47tk£0)£,i  , 

(5.2) 

Eo(We  ’  Ei )  e  div  Ei 

=  Anp, 

(5.3) 

(Vp  •  Hi)  +  p  div  Hi 

=  0  . 

(5.4) 

rot  Ei  — 
rot  Hi  = 
•div  Ei  = 


p  dtH0 


Hi  =:a0Hi 


C  Hq  C  HqJ 

47 TO  Ve  _  ,  „ 

—  +  —  .El=p  —  a  ■  Ei  , 


It  follows 
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-div  Hx  =  ^  •  Hx  =  -0  •  Hi  . 

V  ~ 

Here  a  =  a0  +  a,  P  =  P0  +  P,  «  :=  ,  /3  :=  Using  the  fact  that  in  1H 

Du  =  -  div  w  +  rot  u, 

we  get 

DE1  =  aotfi+p'-a*#!, 

D/f!  =  W-fffl. 

The  right  inverse  of  £>  is  the  corresponding  Teodorescu  transform  Tg  over  G  C  IR3.  A 
short  calculation  leads  to 

E\  —  TqolqHi  —  Tqoi.  •  E\  +  Top7  +  <t>\ , 
iZi  =  Tg/?o-Ei  ~  Tq§_  -  +  fa  j 

where  fa  €  ker  jD  (i  =  1, 2).  The  iteration  method 

E[n)  =  -TGa-E[n-1) +TGa0H[n-1) +  Tap' +  <j>i, 

H[n)  =  TgPq  ■  E {n)  -  TGP  ■  H[n~1)  +  fa, 

with  —  e[0^  —  0  converges  in  suitable  Banach  spaces  (L2,  VUj, C)  under  smallness 
conditions. 


In  the  time-harmonic  case  i.e.  H0-E0  =  l  and  e,/z,  are  constants  and  k  =  k(x)  we 
have 

DEi  =  pf  and  DHi  =  (30  Ei . 

Setting  fa  =  S~x  we  obtain 

D  5D  Hx  =  =  p', 

i.e. 

A  Hi  =  —f. 

If  boundary  values  of  Hi  (trpiZi)  are  known  i.e.  tr^Hx  =  g  the  complete  solution  is 
given  by 

Hi  =  Frg  +  TaVsDh  +  TGQs6TGf  ■  (5.5) 

Here  Vs  and  Qs  are  orthoprojections  on  subspaces  in  the  quaternionic  Hilbert  space 
L2(G),  namely 

L2(G)  =  -5  ker  D  n  L2{G)  0  D  w\  (G). 
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An  alternative  approach  for  solving  Maxwell  equations 

The  scalar  product  is  defined  by 

(u,v)5  :=  j  uSvdGelH. 

G 

The  operator  Vs  can  be  seen  as  a  generalized  Bergman  projection. 

In  the  representation  formula  from  above  is  Fr  the  Cauchy-Bizadse  operator  on  T 
and  h  a  smooth  continuation  of  g  into  G .  Note  that  Vs  and  Qs  can  be  explicitly  defined 
(cf.  [9])!  Then 

Ei  =  -f—  VsDh  +  QsSTaf. 

47 TK 

Let  us  prove  that  the  boundary  condition  is  fulfilled!  Indeed, 

~  o  1  • 

QsTgJ  =  Df  with  /  ew2  i-e.  trrf  =  0. 

TcDf  =  f  —  Fpf  ~  0  (Borel-Pompeiu’s  formula). 

On  the  other  hand,  Plemelj-Sokhotzkij’s  formulae  yield: 

trrHi  =  Prg  +  tr^VsDh  =  Prg  +  trrTDh  -  trrTQsDh 

=  Prg  +  g-Prg  +  o  =  g- 

Pr  is  the  so-called  Plemelj-projection  onto  that  Hardy  space  of  1H- regular  extendible 
functions  into  G. 
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Abstract 

Fitting  of  parametric  curves  and  surfaces  to  a  set  of  given  data  points  is  a  relevant 
subject  in  various  fields  of  science  and  engineering.  In  this  paper,  we  review  the  current 
orthogonal  distance  fitting  algorithms  for  parametric  models  in  a  well  organized  and  easily 
understandable  manner,  and  present  a  new  algorithm.  Each  of  these  algorithms  estimates 
the  model  parameters  minimizing  the  square  sum  of  the  error  distances  between  the  model 
feature  and  the  given  data  points.  The  model  parameters  are  grouped  and  simultaneously 
estimated  in  terms  of  form,  position,  and  rotation  parameters.  The  form  parameters 
determine  the  shape  of  the  model  feature,  and  the  position/rotation  parameters  describe 
the  rigid  body  motion  of  the  model  feature.  The  new  algorithm  is  applicable  to  any  kind 
of  parametric  curve  and  surface.  We  give  fitting  examples  for  circle,  cylinder,  and  helix 
in  space. 

1  Introduction 

The  use  of  parametric  curves  and  surfaces  is  very  common  and  model  fitting  to  a  set  of 
given  data  points  is  a  relevant  subject  in  various  fields  of  science  and  engineering.  For 
fitting  of  curves  and  surfaces,  orthogonal  distance  fitting  is  of  primary  concern  because 
of  the  applied  error  definition,  namely  the  shortest  distance  from  the  given  point  to  the 
model  feature  [5,  9] .  While  there  are  orthogonal  distance  fitting  algorithms  for  explicit  [3], 
and  implicit  models  [2,  7]  in  the  literature,  we  are  considering  in  this  paper  fitting 
algorithms  for  parametric  models  [4,  6,  8,  10,  11]  (Fig.  1). 

The  goal  of  the  orthogonal  distance  fitting  is  the  estimation  of  the  model  parameters 
minimizing  the  performance  index 

a\  =  (X  —  X')TPTP(X  -  X')  (1.1) 

or 

Oq  =  dTPTPd ,  (1.2) 

where  XT  =  (Xj, ... ,  )  and  X'T  =  (XjT, . . .,  X(J)  are  the  coordinates  vectors  of  the 

m  given  points  and  of  the  m  corresponding  points  on  the  model  feature,  respectively. 
Moreover,  dT  =  (d\, . . . ,  dm)  is  the  distances  vector  with  di  —  ||X*  —  XJ||,  PTP  is  the 
weighting  matrix.  We  are  calling  the  fitting  algorithms  based  on  the  performance  indexes 
(1.1)  and  (1.2)  coordinate-based  algorithm  and  distance-based  algorithm ,,  respectively. 
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Fig.  1.  Parametric  features,  and  the  orthogonal  contacting  point  x'  in  frame  xyz  from 
the  given  point  X*  in  frame  XYZ:  (a)  Curve;  (b)  Surface. 

In  this  paper,  the  model  parameters  a  are  grouped  and  simultaneously  estimated 
in  three  categories.  First,  the  form  parameters  ag  (e.g.  three  axis  lengths  a,  6,  c  of  an 
ellipsoid)  describe  the  shape  of  the  standard  model  feature  defined  in  model  coordinate 
system  xyz  (Fig.  1) 

x  =  x(ag,u)  with  ag  —  (ai, . . .  ,a*)T .  (1.3) 

The  form  parameters  are  invariant  to  the  rigid  body  motion  of  the  model  feature.  The 
second  and  the  third  parameters  groups,  respectively  the  position  parameters  ap  and  the 
rotation  parameters  ar,  describe  the  rigid  body  motion  of  the  model  feature- in  machine 
coordinate  system  XYZ: 

X  =  R“1x  +  X0  or  x  =  R(X  —  XQ),  (1.4) 

where  R  =  R^R^R^  =  (ri  r2  r3)T  ,  R-1  =  RT  , 

ap  =  XQ  =  ( XQ ,  Y0,  Z0)T  ,  and  ar  =  (a;,  y>,  k)t  . 


A  subproblem  of  the  orthogonal  distance  fitting  of  a  parametric  model  is  the  finding 
of  the  location  parameters  {iii}^,  which  represent  the  nearest  points  {X^}-^1  on  the 
model  feature  from  each  given  point  {X^JILj.  The  model  parameters  a  and  the  location 
parameters  {u*}  will  generally  be  estimated  through  iteration.  By  the  total  method  [6, 
10],  a  and  {u^}^  will  be  simultaneously  determined,  while  they  are  to  be  separately 
estimated  by  the  variable-separation  method  [4,  8,  11]  in  a  nested  iteration  scheme.  There 
could  be  four  combinations  for  algorithmic  approaches  as  shown  in  Table  1.  One  of  the 
algorithmic  approaches  in  Table  1  results  in  an  obviously  underdetermined  linear  system 
for  iteration,  thus,  it  has  no  practical  application.  We  describe  and  compare  the  realistic 
three  algorithmic  approaches  in  the  following  sections. 
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Algorithmic  approaches 
Total  method 

Variable-separation  method 


Distance-based  algor. 
Under  deter  mined  system 
II  (NPL  [4,  11]) 


Coordinate-based  algor. 

I  (ETH  [6,  10]) 

III  (FhG,  this  paper) 


Tab.  1.  Orthogonal  distance  fitting  algorithms  for  parametric  models. 

2  Orthogonal  distance  fitting  algorithm  I  (ETH) 

The  ETH  algorithm  [6,  10]  is  based  on  the  performance  index  (1.1),  and  simultaneously 
estimates  the  model  parameters  a  and  the  location  parameters  for  the  nearest 

points  on  the  model  feature.  We  introduce  the  new  estimation  parameters  vector  b 
containing  a  and  {ui}^1  as  follows, 


T  t 
(a  >Ui 


The  parameters  vector  b  minimizing  the  performance  index  (1.1)  can  be  determined  by 
the  Gauss-Newton  method 

P—  Ab  =  P(X  -  X')|fc ,  bk+1=bk+aAb,  (2.1) 


with  the  Jacobian  matrices  of  each  point  X'  on  the  model  feature,  from  (1.3)  and  (1.4) 

ex  /_.a*  dR-1  ax0\i 

x;’b“  db  (R  db+  db  X+  3b  ) 


dx  ■■  <9R  1  ^  <9x  "  \ 

~  (R  d&g  1  “a^Tx  0l,  "  ’°i-1,R  fa'  0<+1’  "',0m )  u=u. • 

A  disadvantage  of  the  ETH  algorithm  is  that  the  storage  space  and  the  computing  time 
cost  increase  very  rapidly  with  the  number  of  the  data  points,  unless  the  sparse  linear 
system  (2.1)  is  handled  beforehand  by  a  sparse  matrix  algorithm. 

3  Orthogonal  distance  fitting  algorithm  II  (NPL) 

The  NPL  algorithm  [4, 11]  is  based  on  the  performance  index  (1.2),  and  separately  estim¬ 
ates  the  model  parameters  a  and  the  location  parameters  {ud£Li  in  a  nested  iteration 

scheme  .  .  2  /  r  -v  /  /  m  m  \ 

mln  ,  "VS  ao  ({xi(a.  u)}fei)  • 

The  inner  iteration  determines  the  location  parameters  {u^} ^  for  the  minimum  distance 
points  {XJj-ga  on  the  current  model  feature  from  each  given  point  and,  the 

outer  iteration  updates  the  model  parameters.  In  this  paper,  in  order  to  implement  the 
parameters  grouping  of  aT  =  (aJ,ap  ,  a^T),  we  have  modified  the  initial  NPL  algorithm. 

3.1  Orthogonal  contacting  point 

For  each  given  point  x*  =  R(X*  -XQ)  in  frame  xyz,  we  determine  the  orthogonal  contact¬ 
ing  point  xj  on  the  standard  model  feature  (1.3).  Then,  the  orthogonal  contacting  point 
X£  in  frame  XYZ  to  the  given  point  X*  will  be  obtained  through  a  backward  transform¬ 
ation  of  xj  into  XYZ.  We  are  searching  the  location  parameters  u  which  minimizes  the 
error  distance  between  the  given  point  x*  and  the  corresponding  point  x  on  the  model 
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feature  (1.3) 

D  =  (xi  -  x(ag,  u))  (xf  -  x(ag,  u)) . 

The  first  order  necessary  condition  for  a  minimum  of  (3.1)  as  a  function  of  u  is 


f(xi,x(ag,u))  = 


1  (  Du  \  =  -( (x^  -x(ag.u))Tx, 

2  V  Dv  )  \  (Xi  -  x(ag,  u))Tx, 


The  condition  (3.2)  means  that  the  error  vector  (x*-x)  and  the  surface  tangent  vectors 
dx/du  at  x  should  be  orthogonal.  We  solve  (3.2)  for  u  by  using  the  Newton  method 
(how  to  derive  the  Jacobian  matrix  df/du  is  shown  in  Section  4). 

^1  Au  =  — f(u)|fc  ,  ufc+i  =  uk  +  aAu .  (3.3) 


3.2  Orthogonal  distance  fitting 

We  update  the  model  parameters  a  minimizing  the  performance  index  (1.2)  by  using 
the  Gauss-Newton  method  (outer  iteration) 

_  $d I  _  , , 


Aa  =  -Pd|fc , 


a/e+i  =  afc  +  oAa. 


From  di  =  \\Xi  -  X-||,  and  equations  (1.3)  and  (1.4),  we  derive  the  Jacobian  matrices  of 
each  orthogonal  distance  di 

ddi  (Xi  -  X')T  ax 
di’a  aa  IIXi-X!ll  aa 


(Xj-X')T  / 
l|Xi-X<||  V 

With  (1.4)  and  (3.2)  at  u  =  u' 


_i  (  dx  dxdu 
\  da  da. 


R1  ax0 
aa_x+'ai“ 


(Xi-X'fR-1!^  =(xi-x')T^ 

OU  u=u'  OU  U=U/- 


and  3  di  ,a  — 


(X,-XQ 

l|Xi-X'| 


is  the  resultant  Jacobian  matrix  for  di.  A  drawback  of  the  NPL  algorithm  is  that  the 
convergence  and  the  accuracy  of  3D-curve  fitting  (e.g.  fitting  of  a  circle  in  space)  are 
relatively  poor.  2D-curve  fitting  or  surface  fitting  with  the  NPL  algorithm  do  not  suffer 
from  such  problems. 

4  Orthogonal  distance  fitting  algorithm  III  (FhG) 

At  the  Fraunhofer  Institute  IPA  (FhG-IPA),  a  new  orthogonal  distance  fitting  algorithm 
for  parametric  models  is  developed,  which  minimizes  the  performance  index  (1.1)  in  a 
nested  iteration  scheme  (variable-separation  method).  The  new  algorithm  is  a  general¬ 
ized  extension  of  an  orthogonal  distance  fitting  algorithm  for  implicit  plane  curves  [1]. 
Interested  readers  are  referred  to  [2]  for  the  orthogonal  distance  fitting  of  implicit  sur¬ 
faces  and  plane  curves.  The  location  parameter  values  for  the  minimum  distance 
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points  on  the  current  model  feature  from  each  given  point  {X*}-^  are  to  be 

found  by  the  algorithm  described  in  Section  3.1  (inner  iteration).  In  this  section,  we 
intend  to  describe  the  outer  iteration  which  updates  the  model  parameters  a  minimizing 
the  performance  index  (1.1)  by  using  the  Gauss-Newton  method 
$X' 

Aa  =  P(X  —  X')|fc ,  a*+1=a*  +  aAa,  (4.1) 

k 


da 


with  the  Jacobian  matrices  of  each  orthogonal  distance  point  X',  from  (1.3)  and  (1.4) 


Jx'  a  = 


dX 

da 


x-x'  \  \da  duda) 


+ 


dR 


-1 


da 


•x  + 


dX o 
da 


<9u  da 


+  R 


i  &x 
da„ 


dR 


dar 


(4.2) 


The  derivative  matrix  du/da  at  u  =  u'  in  (4.2)  describes  the  variational  behavior  of 
the  location  parameters  uj  for  the  orthogonal  contacting  point  xf{  in  frame  xyz  relative 
to  the  differential  changes  of  the  parameters  vector  a.  Purposefully,  we  derive  du/da 
from  the  condition  (3.2).  Because  (3.2)  has  an  implicit  form,  its  derivatives  lead  to 


dx,  df 


df  du 
du  da  dx.i  da 


+  d^ 


0 


or 


df  du 
du  da 


(  df  dxi  df\ 
\ dxi  da  +  da / 


(4.3) 


where  dxi/da  is,  from  x*  =  R(X 

XG)  -  R 


dxi  _  dR 
da  da  1  ’ 


Xo), 

dXQ 

da 


-( ° 


R 


X0) 


The  other  three  matrices  df/du,  df/dx* ,  and  di/da  in  (3.3)  and  (4.3)  are  to  be  directly 
derived  from  (3.2).  The  elements  of  these  three  matrices  are  composed  of  simple  linear 
combinations  of  components  of  the  error  vector  (x7  —  x)  with  elements  of  the  following 
three  vector /matrices  dx/du,  H,  and  G  (XHG  matrix): 

/x  x  \  /G° 

H  _  -*-UU  -^UV  j  _ 

\  X^^y  XyV  J 


dx 


du 


=  (x„  x„) 


da. 


(4.4) 


—  =  (x„;  x„)T(xu  x„) 


(Xi 

(x; 


df  _  .  -t 

-  (*»  Xy) 


0 


0 


(xi  -xyxt 

(Xi  -  x)Tx, 

££  _  (  xjG0  -  (xj  -  x)tGi 
\xjG0-  (Xi  -x)tG2 

Now  (4.3)  can  be  solved  for  du/da  at  u==uj,  and  the  Jacobian  matrix  (4.2)  and  the 
linear  system  (4.1)  can  be  completed  and  solved  for  the  parameter  update  Aa. 

We  would  like  to  stress  that  only  the  standard  model  equation  (1.3),  without  involve¬ 
ment  of  the  position/rotation  parameters,  is  required  in  (4.4).  The  overall  structure  of 
the  FhG  algorithm  remains  unchanged  for  all  dimensional  fitting  problems  of  parametric 
models.  All  that  is  necessary  for  a  new  parametric  model  is  to  derive  the  XHG  matrix 
of  (4.4)  from  (1.3)  of  the  new  model  feature,  and  to  supply  a  proper  set  of  initial  para- 
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Fig.  2.  Information  flow  with  the  FhG  algorithm. 


X 

5 

6 

5 

5 

3 

2 

0 

-1 

-1 

0 

3 

4 

7 

9 

Y 

1 

3 

4 

6 

5 

4 

2 

0 

~2 

-5 

-7 

-8 

-10 

-9 

Z 

-3 

-1 

1 

3 

5 

7 

9 

11 

11 

11 

11 

11 

11 

10 

Tab.  2.  Fourteen  coordinate  triples  representing  a  helix. 


meter  values  ao  for  iteration  (4.1).  An  overall  schematic  information  flow  with  the  FhG 
algorithm  is  shown  in  Fig.  2.  The  FhG  algorithm  shows  robust  and  fast  convergence 
with  2D/3D-curve  and  surface  fitting.  The  storage  space  and  computing  time  cost  are 
proportional  to  the  number  of  data  points.  A  disadvantage  of  the  FhG  algorithm  is  that 
it  additionally  requires  the  second  derivatives  <92x/<9ag<9u  as  shown  in  (4.4). 

As  a  fitting  example,  we  show  the  orthogonal  distance  fitting  of  a  helix.  The  standard 
model  feature  (1.3)  of  a  helix  in  frame  xyz  can  be  described  as  follows.  x(ag,u)  = 
x(r,  h,u)  =  (r  cos  u,r  sinti,  hu/2n)  ,  with  a  constraint  on  the  position  and  rotation 

parameters  _  _ 

/c(ap,ar)  =  (Xo-X)Tr3(Wl¥.)=0,  _ 

where  r  and  h  are  respectively  the  radius  and  elevation  of  a  helix.  X  is  the  gravitational 
center  of  the  given  points  set  and  r3  (see  (1.4))  is  the  vector  of  direction  cosines  of 
the  z-axis.  We  have  obtained  the  initial  parameter  values  from  a  3D-circle  fitting,  and 
a  cylinder  fitting,  successively.  The  helix  fitting  to  the  points  set  in  Table  2  with  the 
initial  values  of  /i  =  10  and  k  —  i r  terminated  after  0.22s,  8  iteration  cycles  for  ||Aa||  = 
3.2  xlO-7  with  a  Pentium  133  MHz  PC  (Table  3,  Fig.  3).  They  Were  0.33s,  10  iteration 
cycles  for  j|Aa||  —  3.6  x  10"7  with  the  ETH  algorithm,  and,  1.05s,  61  iteration  cycles  for 
||  Aa||  =8.8xl0"7  with  the  NPL  algorithm.  The  computing  cost  with  the  ETH  algorithm 
increases  rapidly  with  the  number  of  the  data  points.  The  NPL  algorithm  showed  slow 
convergences  with  the  3D-circle  and  the  helix  fitting  (3D-curve  fitting). 
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Parameters  a 

<70 

r 

h 

3D-Circle 

5.8913 

8.3850 

—  . 

5.6999 

a  (a) 

— 

0.7355 

— 

0.9939 

Cylinder 

1.6925 

8.2835 

— 

4.7596 

.  <7(a) 

— . 

0.2738 

— 

0.7465 

Helix 

2.2301 

6.1368 

19.5811 

3.8909 

cr(a) 

— 

0.4238 

1.3214 

0.5488 

u 

V 

K 

-2.7923 

5.2333 

-0.6833 

0.7882 

- - 

0.8421 

0.8821 

0.1177 

0.1375 

— 

-3.0042 

4.5081 

-0.4576 

1.1327 

— 

0.4525 

0.6513 

0.3049 

0.2116 

— 

-1.5560 

6.4871 

0.3003 

0.5114 

2.4602 

0.3934 

0.7500 

0.0880 

0.0663 

0.2881 

Tab.  3.  Results  of  the  orthogonal  distance  fitting  to  the  points  set  in  Table  2. 


(a)  (b) 


Fig.  3.  Orthogonal  distance  fitting  to  the  points  set  in  Table  2:  (a)  Helix  fit;  (b)  Con¬ 
vergence  of  the  fit.  Iteration  number  0-3:  3D-circle,  4-12:  circular  cylinder,  and  13-: 
helix  fit  with  the  initial  value  of  h=  10  and  k= tt. 

5  Summary 

In  this  paper,  we  have  reviewed  the  current  orthogonal  distance  fitting  algorithms  for 
parametric  curves  and  surfaces  in  an  easily  understandable  manner,  and  presented  a  new 
algorithm.  By  each  of  the  algorithms  the  model  parameters  are  grouped  and  simultan¬ 
eously  estimated  in  terms  of  form /position/rot  at  ion  parameters.  The  ETH  algorithm  de¬ 
mands  a  large  amount  of  storage  space  and  high  computing  cost,  and  the  NPL  algorithm 
shows  relatively  poor  performance  with  3D-curve  fitting.  The  new  algorithm,  the  FhG 
algorithm,  has  no  such  drawbacks  of  the  ETH  algorithm  or  of  the  NPL  algorithm.  A 
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disadvantage  of  the  FhG  algorithm  is  that  it  requires  the  second  derivatives  <92x/<9ag<9u. 
The  FhG  algorithm  does  not  require  a  necessarily  good  set  of  initial  parameter  val¬ 
ues,  which  could  also  be  internally  supplied  as  demonstrated  with  the  fitting  examples. 
From  the  viewpoint  of  implementation  and  application  to  a  new  model  feature,  the  FhG 
algorithm  is, universal  and  very  efficient.  Merely  the  standard  model  equation  (1.3)  of 
the  new  model  feature  is  eventually  required,  which  has  only  few  form  parameters.  The 
functional  interpretation  and  treatment  of  the  position/rotation  parameters  are  basic¬ 
ally  identical  for  all  parametric  models.  The  storage  space  and  the  computing  time  cost 
are  proportional  to  the  number  of  given  data  points.  Together  with  other  orthogonal 
distance  fitting  algorithms  for  implicit  models  [2],  the  FhG  algorithm  is  certified  by  the 
German  federal  authority  PTB  [5,  9] ,  with  a  certification  grade  that  the  parameter  es¬ 
timation  accuracy  is  higher  than  0.1/rni  for  length  unit,  and  0.1 //rad  for  angle  unit  for 
all  parameters  of  all  tested  model  features. 
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Abstract 

We  present  a  method  for  matching  a  surface  in  three  dimensions  to  a  set  of  data  sampled 
from  the  surface  by  means  of  minimising  the  distances  from  the  data  points  to  the  closest 
point  on  the  surface.  This  method  of  association  is  affine  transformation  invariant  and 
as  such  is  very  useful  in  situations  where  the  coordinate  axes  are  essentially  arbitrary. 
Traditionally,  this  problem  has  been  solved  by  minimising  the  £ 2  norm  of  the  distances 
from  the  data  points  to  the  corresponding  points  in  the  surface,  while  the  use  of  other 
£v  norms  is  less  well  known.  We  present  a  method  for  template  matching  in  the  £\  norm 
based  upon  a  method  of  directional  constraints  developed  by  Watson  for  the  related 
problem  of  orthogonal  distance  regression.  An  algorithm  for  this  method  is  given  and 
numerical  results  show  its  effectiveness. 

1  Introduction 

Template  matching  is  used  in  a  variety  of  applications  such  as  the  quality  assurance  of 
manufactured  artifacts  [1]  and  dental  metrology  [2].  Given  a  fixed  template,  i.e.,  curve 
or  surface,  and  a  set  of  data  in  a  different  frame  of  reference,  template  matching  involves 
finding  the  frame  transformation  which  maps  the  data  onto  the  template. 

A  typical  strategy  for  finding  the  optimal  transformation  parameters  in  the  template 
matching  problem  is  to  minimize,  in  some  norm,  the  orthogonal  distances  between  the 
transformed  data  and  the  template.  In  this  case,  the  template  matching  problem  can  be 
viewed  as  a  form  of  orthogonal  distance  regression  (ODR)  [3],  which  is  a  technique  com¬ 
monly  used  for  fitting  curves  and  surfaces  to  measured  data.  Therefore,  most  algorithms 
for  solving  the  template  matching  problem  are  extensions  of  algorithms  for  ODR.  Tem¬ 
plate  matching  in  the  £ 2  norm  is  addressed  by  Turner  [3]  and  in  the  £0 c  norm  by  Butler 
et  al.  [1]  as  well  as  by  Zwick  [7]  for  the  two  dimensional  case. 

In  this  paper,  we  are  specifically  concerned  with  the  following  problem. 

Given  a  fixed  differentiable  parametric  surface  f (u,v)  and  a  set  of  m  data 
{x*}^  €  5ft3,  find  points  {f (it*,  a  rotation  matrix  i?©,  and  a  transla¬ 

tion  vector  to  such  that  the  £\  norm  of  the  residual  distances  {||i?©(x*  -  to)  — 
is  minimal. 

This  is  the  template  matching  problem  in  the  £\  norm,  and  although  not  as  widely  used 
as  the  l 2  and  £0 0  counterparts,  it  does  nonetheless  have  an  important  role  to  play.  The 
importance  of  the  £\  norm  is  that,  generally  speaking,  any  outlying  data  are  effectively 
ignored  with  the  result  that  an  approximation  is  obtained  which  is  largely  independent 
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of  any  unreliable  data.  This  has  particular  importance  when  our  data  arises  as  a  result 
of  some  measurement  process,  perhaps  involving  many  complicated  and  finely-tuned 
instruments.  For  such  a  measurement  scenario,  any  change  in  the  assumed  measurement 
conditions  can  result  in  a  datum  which  has  gross  error  relative  to  other  data.  Thus,  if  we 
choose  a  measure  which  is  susceptible  to  outlying  data,  we  are  in  danger  of  obtaining  an 
unrepresentative  approximation.  This  situation  is  avoided  by  use  of  the  t\  norm  and  we 
therefore  advocate  its  use  both  here  and  in  any  situation  involving  measurement  data 
where  a  representative  approximation  is  required. 

A  feature  of  optimal  £\  solutions  is  the  likelihood  of  a  small  number  of  the  data  having 
a  residual  of  zero,  and  it  is  therefore  unclear  whether  the  elements  of  the  Jacobian  matrix 
of  partial  derivatives  are  well-defined  for  these  points.  As  a  result,  use  of  the  usual 
Gauss-Newton  method  would  appear  to  be  handicapped  due  to  its  dependence  upon 
the  Jacobian  matrix  to  calculate  an  updated  transformation  estimate.  This  difficulty 
also  arises  in  the  conventional  ODR  fitting  problem  and  has  recently  been  considered  by 
Watson  [6].  His  solution  is  to  adopt  a  method  of  fitting  subject  to  directional  constraints. 
By  setting  these  directional  constraints  to  be  orthogonal  to  the  approximant,  Watson 
shows  not  only  that  the  Jacobian  is  defined  but  also  how  to  compute  its  elements  without 
incurring  a  build-up  of  rounding  error. 

In  this  paper,  we  extend  Watson’s  constrained  direction  fitting  routine  to  the  template 
matching  problem.  We  show  that  Watson’s  results  are  equally  valid  for  £±  template 
matching.  Finally,  we  exploit  these  results  to  give  a  reliable  algorithm  for  the  fy  template 
matching  problem. 

The  structure  of  this  paper  is  as  follows.  Section  2  provides  the  results  necessary 
to  justify  the  new  technique.  Section  3  describes  the  algorithm  adopted  to  implement 
the  theory.  Section  4  gives  some  numerical  results  for  both  a  simple  case  arid  a  more 
challenging  case.  Finally,  Section  5  concludes  this  paper  and  presents  possibilities  for 
future  work. 


2  Theory 

We  are  concerned  with  the  minimisation  of  the  quantity 


where 


and 


di  =  min  ||xi  -  i{uu  Vi)\\2  ,  i  =  l,2,...,m, 

UiiVi 


x  =  i?e(x- 10), 

with  respect  to  the  rotation  parameters 

0i 

©  =  I  02  |  , 

03 


(2.1) 

(2.2) 

(2.3) 
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the  translation  parameters 

t0 


and  the  location  parameters 

This  is  a  constrained  problem  and  can  be  solved  using  a  separation-of-variables  approach 
as  described  by  Turner  [3]  among  others.  In  this  approach,  the  problem  of  obtaining  the 
transformation  parameters 

t=(t°o)’ 

is  separated  from  the  subproblem  of  obtaining  the  location  parameters  U,  At  each  iter¬ 
ation,  the  subproblem  is  solved  to  obtain  an  optimal  U  for  the  current  transformation 
parameters  t  which  is  then  used  to  obtain  an  update  of  the  transformation  parameters 
themselves. 

2*1  Considerations  specific  to  the  i\  problem 

Up  to  this  point,  we  have  not  specified  which  norm  we  are  using  to  measure  the  disparity 
between  the  transformed  data  and  the  template.  Since  we  will  be  particularly  interested 
in  the  h  case,  this  section  discusses  problems  inherent  in  the  solution  of  such  a  problem. 

The  major  problem  with  solving  non-linear  i\  problems  is  that  in  order  to  use  a 
technique  such  as  the  Gauss-Newton  method,  derivative  information  is  required.  Unfor¬ 
tunately,  derivatives  of  the  distances  d  are  not  defined  when  a  distance  has  a  value  of 
zero.  Such  is  the  nature  of  i\  approximation  that  zeros  are  to  be  expected  at  an  optimal 
solution  [5].  Thus,  it  is  unclear  whether  the  Jacobian  matrix  is  defined  at  these  data 
points.  Recent  work  by  Watson  [6]  has  considered  how  the  related  problem  of  ortho¬ 
gonal  distance  regression  might  be  solved  by  considering  distances  to  be  measured  along 
fixed  direction  vectors  w*.  Orthogonal  distance  regression  involves  the  fitting  of  a  curve 
or  surface  to  a  set  of  data  where  the  residuals  are  taken  to  be  the  shortest  distance  from 
the  data  to  the  approximant  [3].  Template  matching  can  be  seen  as  a  variant  of  this  since 
the  residuals  are  measured  in  the  same  way,  but  we  are  only  altering  the  position  and 
orientation  of  the  approximant,  rather  than  the  actual  shape  itself.  Thus,  techniques  for 
orthogonal  distance  regression  can  be  used  successfully  in  template  matching. 

By  means  of  these  directional  constraints,  it  is  possible  to  show  that  if  we  choose  the 
directions  w*  to  be  the  orthogonal  directions, 

-  ~  Xj 

Wi~ 

then  the  derivatives  are  well  defined  in  the  limit  as  ||f(u?;,Uj)  ||2  ^  0* 

This  result  may  be  summarised  in  the  following  Theorem  (taken  from  Watson  [6]). 

Theorem  2.1  For  parametric  fitting ,  let  the  (usual)  Gauss-Newton  method  produce  a 
sequence  { t}  such  there  is  a  unique  unit  normal  vector  to  the  template  at  i(ui,Vi),  and 
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x*  remains  on  one  side  of  the  template.  Then  Vt di  is  well  defined  on  this  sequence. 

If  f (ui,Vi)  — >  Si,  then  this  formulation  will  lead  to  similar  problems  to  which  we 
are  attempting  to  resolve  as  a  result  of  the  quotient  becoming  undefined.  As  a  result, 
Watson  [6]  suggests  leaving  w*  unchanged  once  di  becomes  small.  By  this  method, 
numerical  problems  arising  as  a  result  of  a  distance  tending  to  zero  may  be  avoided. 
However,  the  algorithm  will  still  tend  to  the  correct  solution  provided  that  the  small 
residual  corresponds  to  an  interpolation  point  of  the  t\  solution.  If  this  is  not  the  case, 
then  the  solution  will  not  be  optimal,  but  will  still  be  close  to  the  optimal  solution. 

2.2  Possible  problems 

The  most  immediate  problem  that  arises  is  how;  to  ensure  that  there  exists  a  point  on 
the  template  which  is  situated  along  the  direction  vector  given  from  each  datum.  Clearly 
in  certain  situations,  there  will  not  exist  such  a  point  —  corresponding  to  the  case  where 
the  direction  vector  lies  within  the  tangent  plane  of  the  template  in  the  region  of  the 
datum.  In  such  a  situation  there  would  seem  to  be  two  possible  recourses  available. 

(1)  Ignore  these  data. 

(2)  Choose  the  point  on  the  template  that  is  closest  to  the  line  though  the  datum 
defined  by  the  direction  vector. 

It  has  been  found  through  empirical  results  that  provided  the  problem  only  occurs  on 
certain  iterations  rather  than  as  a  result  of  poor  choice  of  the  direction  vectors  associated 
with  the  template,  ignoring  the  problem  data  is  the  better  option.  Use  of  the  second 
option  has  been  found  to  prevent  convergence  of  the  algorithm. 

3  Algorithm 

The  algorithm  to  implement  this  technique  consists  of  two  sub- algorithms,  each  related 
to  a  specific  section  of  the  main  algorithm.  These  sub- algorithms  are 

(1)  the  constrained  closest  point  problem, 

(2)  the  calculation  of  a  new  transformation  estimate. 

3.1  Constrained  closest  point  problem 

For  each  data  point  x*,  this  problem  is  that  of  finding  Ui  and  Vi  such  that  the  constraint 

x  —  f(u,v)  =  dw,  (3.1) 

is  satisfied  (subscripts  dropped  for  clarity).  Expanding  this  equation,  we  obtain 


If  we  pre-multiply  this  equation  by  aT,  we  obtain 

aTx  —  aTf(u,  u)  —  daTw  =  0.  (3.2) 


133 


I.  J.  Anderson  and  C.  Ross 


134 

Thus,  by  choosing  a  to  be  orthogonal  to  w,  we  are  able  to  eliminate  d  from  equation 

(3.2) .  Similarly,  if  we  multiply  equation  (3.1)  by  b  we  obtain  the  equation 

bTx  —  bTf(u,  v)  —  dbTw  =  0. 

We  may  thereby  reduce  the  system  (3.1)  to  that  of  two  (nonlinear)  equations  in  two 
unknowns  ( u  and  v).  This  system  can  then  be  solved  by  adopting  a  Newton- type  method. 
Our  problem  has  been  reduced  to  that  of  solving 

F(u,v)  =  [a  :  b]T(x  -  f(u,u))  =  0, 

which  has  derivative 

VU,„F  =  -[a  :  b]T(V„f  :  V„f), 

by  means  of  Newton’s  method  which  involves  adopting  an  iterative  approach  and  solving 

V„.f(  )  =  -F(u,v),  (3.3) 

at  each  stage  to  obtain  better  estimates  u  +  Su  and  v  +  Si The  quantities  F(u,v )  and 
VUjVF  are  straightforward  to  calculate  as  they  arise  directly  from  the  explicit  paramet- 
erisation  of  the  template. 

All  that  remains  is  the  choice  of  a  and  b.  We  obtain  these  vectors  by  taking  the  cross 
product  of  w  with  two  arbitrary  vectors  —  resulting  in  two  vectors  which  are  orthogonal 
to  v.  More  generally,  the  vectors  a  and  b  should  be  chosen  to  ensure  that  the  system 

(3.3)  is  well-conditioned. 

3.2  Updating  the  transformation  estimate 

The  method  we  adopt  to  obtain  an  update  of  the  transformation  parameters  is  the 
Gauss-Newton  method.  This  involves  solving,  at  each  iteration,  the  problem 

JSt  =  -d,  (3.4) 

in  the  l\  sense,  where  J  is  the  Jacobian  matrix  of  partial  derivatives  with  entries  Jij  = 
Vtjdi.  The  estimate  of  the  optimal  transformation  parameters  is  then  updated  according 
to 

t  =  t  +  <5t. 

Thus,  since  the  distances  d  are  obtained  from  the  constrained  closest  point  subprob¬ 
lem,  we  are  left  with  the  task  of  calculating  the  Jacobian  matrix.  For  each  datum,  from 
equation  (3.1),  we  have  tha.t 

x(t)  -  f (it(t),  v(t))  =  wd(u(t),  v(t)), 

where  we  have  explicitly  included  the  dependency  of  the  distance  d  on  the  location 
parameters  U .  Differentiating  and  rearranging,  we  obtain 

Vtx  =  wVtd  +  Vt/fVt£/. 

This  is  equivalent  to  the  form 

Vtx  =  [w:V,;f](  ^  ). 
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Therefore, 

J  ==  Vtd  =  ef[w  :  Vuf]_1Vtx, 

where  ei  is  the  first  component  vector.  Having  obtained  the  Jacobian  matrix  J  and  the 
distance  vector  d,  we  are  now  in  a  position  to  solve  the  system  (3.4)  in  order  to  update 
our  estimate  of  the  optimal  transformation  parameters  t. 

We  note  that  using  the  traditional  orthogonal  distances  can  lead  to  problems  since 
calculation  of  the  Jacobian  matrix  involves  division  of  each  row  by  the  corresponding 
orthogonal  distance  —  leading  to  exacerbation  of  rounding  errors  and  possible  division 
by  zero  especially  in  the  £\  case. 

4  Numerical  results 

In  this  section,  we  present  two  example  to  illustrate  the  techniques  presented  in  this 
paper.  In  the  first,  we  have  a  small  number  of  data  which  we  wish  to  match  to  a  given 
plane.  In  the  second,  we  have  a  larger  number  of  data  and  we  wish  to  match  them  to 
a  cylinder.  In  both  cases,  although  analytical  expressions  are  available  to  obtain  the 
constrained  closest  points  on  the  templates,  we  nonetheless  utilise  the  method  presented 
above  in  order  to  test  its  effectiveness. 

4.1  Simple  problem 

Here  we  describe  the  problem  of  matching  a  representative  set  of  8  data  onto  the  plane 
defined  as 


Since  this  problem  is  rank  deficient  if  we  use  all  six  possible  transformation  parameters, 
we  restrict  ourselves  to  using  a  translation  in  the  ^-direction  and  rotations  about  the  x 
and  y  axes. 

Having  three  degrees  of  freedom,  we  might  expect  to  obtain  an  optimal  i\  solution 
which  interpolates  3  of  the  data.  However,  as  we  shall  see,  this  is  unattainable  in  general 
and  we  can,  in  fact,  only  expect  interpolation  at  two  points.  As  Watson  states  [6],  in 
such  a  situation,  the  rate  of  convergence  can  be  unacceptably  slow.  This  is  found  to  be 
the  case.  It  can  be  seen  that  not  only  is  the  convergence  slow,  but  an  optimal  solution 


Iteration 

norm  (residuals) 

norm  (update) 

1 

0.6662 

4.9901e-02 

5 

0.3008 

3.5716e-04 

10 

0.3007 

8.8545e-06 

50 

0.3006 

9.1533e-06 

100 

0.3008 

3.8514e-04 

Tab.  1  Progress  of  the  Gauss-Newton  method  for  planar  data. 


is  never  obtained,  with  the  objective  function  ||d||i  increasing  occasionally. 
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To  ensure  convergence,  a  simple  line-search  algorithm  was  adopted  which  searches 
along  the  direction  obtained  from  the  Gauss-Newton  step  for  the  maximum  reduction 
in  the  objective  function.  This  modification  affects  convergence  in  3  iterations. 

4.2  A  more  challenging  problem 

As  a  more  challenging  problem,  we  consider  the  matching  of  a  set  of  128  data  which 
supposedly  represent  a  cylinder  but  which  contain  8  wild  points.  The  cylinder  is  para¬ 
metrised  by  u  and  v  as 

(cosw  \ 
sinw  , 

v  / 

resulting  in  a  cylinder  with  unit  radius  oriented  along  the  2- axis.  Again,  the  problem 
of  matching  the  data  onto  this  model  is  rank  deficient.  The  rank  deficiencies  occur  due 
to  rotations  about  the  2-axis  and  translations  along  the  2- axis.  As  such,  we  omit  these 
possible  transformations. 

Although  we  might  initially  expect  to  interpolate  4  data  points  at  an  optimal  t\  solu¬ 
tion,  we  find  that  in  fact  only  two  are  guaranteed,  although  if  a  third  point  lies  within 
two  radii  of  one  of  these  two  points,  then  three  points  can  be  guaranteed.  Typically,  this 
will  occur  when  the  data  is  representative.  For  the  data  set  we  are  considering,  we  expect 
three  interpolation  points  due  to  the  data  representing  the  cylinder  and  in  fact  at  the 
optimal  solution,  three  interpolation  points  are  obtained.  In  fact,  the  “missing”  interpol¬ 
ation  has  the  effect  of  slowing  convergence  of  the  Gauss-Newton  method  considerably  so 
that  in  100  iterations,  the  algorithm  had  not  been  deemed  to  converge.  However,  by  the 
introduction  of  a  simple  line-search  method,  the  algorithm  converged  in  five  iterations 
as  displayed  in  Table  2. 


Iteration 

norm(residuals) 

norm  (update) 

1 

0.9654 

5.6796e-03 

2 

0.9559 

6.2932e-04 

3 

0.9557 

1.0141e-04 

4 

0.9557 

2.5812e-07 

5 

0.9557 

4.4006e-14 

Tab.  2  Progress  of  the  Gauss-Newton  method  for  cylindrical  data  using  a  line-search. 


5  Conclusions 

This  paper  has  shown  how  perceived  problems  in  £i  template  matching  can  be  avoided 
by  use  of  the  so-called  “method  of  directional  constraints” .  In  this  method,  the  closest 
point  on  the  template  along  a  given  direction  vector  is  calculated  in  order  to  obtain  the 
residuals  between  data  and  template.  By  then  altering  this  direction  vector  to  be  the 
normal  to  the  surface  at  that  projected  point,  the  algorithm  progresses  to  the  expected 
£i  solution.  Problems  regarding  undefined  quotients  are  avoided  by  no  longer  updating 
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the  direction  vectors  corresponding  to  a  datum  when  the  residual  associated  with  that 
point  is  below  a  certain  tolerance. 

This  work  forms  part  of  a  larger  project  to  consider  novel  approaches  to  ill-conditioned 
problems  in  metrology.  It  is  hoped  that  the  work  presented  in  this  paper  will  aid  in  the 
resolution  of  rank-deficient  systems  and  ill-conditioned  systems  by  altering  the  usual 
orthogonal  distances  to  be  these  directional  constraints,  which  should  remove  some  of 
the  rank  deficiency. 

As  an  example,  consider  the  template  matching  problem  where  the  template  to  be 
matched  is  an  infinite  cylinder  with  axis  along  the  axis.  Using  typical  template  match¬ 
ing  algorithms,  this  problem  is  rank  deficient  by  two  at  the  solution  due  to  the  possible 
translation  in  the  z-axis  and  the  possible  rotation  about  the  2:- axis.  By  introducing  these 
directional  constraints,  the  rotational  rank  deficiency  is  almost  completely  resolved  (there 
are  now  two  possible  rotations  to  obtain  the  optimal  matching  rather  than  the  infinite 
number  previously). 

The  use  of  the  i\  norm  is  also  being  used  to  attempt  and  resolve  any  rank  deficien¬ 
cies  and  ill-conditioning  present  in  the  problem.  This  is  achieved  by  ensuring  that  any 
local  deviations  from  the  template  (caused  by,  for  example,  wear)  are  “ignored”  so  that 
regions  of  local  deviations  might  be  compared.  This  will  then  result  in  a  resolution  of 
the  uncertainty  in  the  transformation  parameters. 
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Abstract 

To  combine  the  information  from  several  laboratories  to  output  a  representative  value 
xr  and  its  probability  distribution  function  is  the  main  aim  of  an  inter-comparison  in 
Metrology.  Here,  the  proposed  procedure  identifies  a  simple  model  for  this  probability 
function,  by  taking  into  account  only  the  probability  interval  estimates  as  a  measure  of 
the  uncertainty  in  each  laboratory.  A  mixture  density  model  is  chosen  to  characterize 
the  stochastic  variability  of  the  inter-comparison  population  considered  as  a  whole.  The 
bootstrap  method  is  applied  to  approximate  the  distribution  function  of  the  comparison 
output  in  an  automatic  way. 


1  Introduction 

The  “mise  en  pratique”  of  the  Mutual  Recognition  Arrangement  (MRA),  issued  by  na¬ 
tional  metrological  Institutions  in  1999,  prompted  new  studies  and  projects  in  Metrology 
mainly  concerning  the  inter-laboratory  comparisons  area. 

Recently,  considerable  effort  has  been  devoted  to  finalise  the  problem  of  the  choice 
of  a  suitable  statistical  procedure  to  summarise  inter-comparison  data.  The  problem 
solution  is  influenced  by  both  metrological  and  statistical  considerations,  but  it  can  also 
depend  on  the  physical  quantity  under  comparison. 

Some  of  the  critical  issues  now  emerging  are  related  to  several  different  reasons.  For 
instance,  the  statistical  information  supplied  by  each  laboratory  is  synthetic,  since  it 
comes  from  a  data  reduction  process  performed  on  several  experimental  datasets.  In 
each  laboratory,  assumptions  and  statistical  reduction  procedures  may  be  different  and 
sometimes  not  fully  documented  or  the  a  priori  information  on  the  original  data  may 
be  insufficient  to  define  a  “credible”  probability  distribution  function  (pdf)  for  output 
quantities  of  the  inter-comparison. 

The  use  of  the  whole  sets  of  original  data  from  each  laboratory  might  be  an  unfeasible 
approach  in  the  inter-comparison  case,  due  to  the  unavailability  of  all  needed  data  or 
to  practical  reasons.  At  present,  the  practice  is  to  supply  synthetic  information  Xt  by 
each  participant  to  the  inter-comparison  and  to  use  a  location  estimator  to  output  the 
representative  value. 
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Efforts  should  be  given  to  improving  the  reliability  of  inter-comparison  results  by 
asking  for  the  use  of  any  a  priori  information  and  of  its  “credibility”  to  go  ahead, 
towards  the  direct  estimation  of  the  output  of  the  comparison,  xr. 

This  paper  proposes  the  identification  of  a  solution  without  resorting  to  the  synthetic 
values  and  its  point  estimates  of  the  standard  uncertainty,  but  only  to  the  probability 
interval  estimates  as  the  measure  of  the  uncertainty.  This  approach  consists  of  two 
parts:  a  modelling  procedure  to  identify  a  simple  mixture  model  able  to  approximate  the 
stochastic  variability  of  the  inter- comparison  population  as  a  whole;  a  parametric  Monte 
Carlo  algorithm  to  automatically  estimate  the  probability  distribution  of  the  output  xr 
and  any  accuracy  measures  at  a  prescribed  precision. 

The  concept  of  a  mixture  of  distribution  functions  occurs  when  a  population  made 
up  of  distinct  subgroups  is  sampled,  for  example,  in  biostatistics,  when  it  is  required 
to  measure  certain  characteristics  in  natural  populations  of  a  particular  species.  In  an 
inter-comparison  each  participant  constitutes  a  subgroup. 

The  Monte  Carlo  method,  based  on  the  principle  of  mimicking  sampling  behaviour, 
can  always  compute  a  numerical  solution  in  an  automatic  way,  also  when  the  required 
analytic  calculations  may  not  be  simple.  If  the  Monte  Carlo  method  is  applied  with  the 
principle  of  substitution  (of  the  unknown  probability  function  with  a  probability  model 
estimated  from  the  given  sample),  the  approach  is  known  as  the  bootstrap  approach  [4] 
and  is  already  used  in  Metrology  [2].  In  [1]  the  case  of  a  multivariate  normal  mixture 
model  is  considered  and  the  standard  errors  are  estimated  by  means  of  the  parametric 
bootstrap.  The  present  algorithm  will  be  applied  to  a  thermometric  inter-comparison, 
where  data  cannot  be  assumed  to  be  normally  distributed. 

2  Data  structure  of  an  inter-comparison  with  interval  data 

The  number,  AT,  of  laboratories  involved  in  an  inter-comparison  is  typically  small.  In 
the  i-th  laboratory,  the  . . .  ,  measurements  are  supposed  to  pertain  to  a  single 
probability  distribution  function,  say  Fi( A),  where  A  is  the  parameter  vector,  that  may  be 
partially  unknown.  The  measurements  are  statistically  analysed  and  reduced  to  provide 
to  the  comparison  the  synthetic  value  Xi  and  its  uncertainty  Ui  at  95%  confidence  level, 
or  a  95%  uncertainty  interval  (95%CT):  ((^1,^1) . . .  ,  (xn,  u^)). 

In  this  work  the  uncertainty  is  considered  as  “a  95 %CI  rather  than  as  a  multiple  of  the 
standard  deviation”  (see  4.3.4  in  [6]).  Then  an  aim  of  an  inter-comparison  is  to  combine 
the  input  data  in  the  labs  to  characterise  a  representative  value  of  the  inter- comparison, 
i.e.,  the  random  variable  0  and  its  pdf  F.  Hence  a  good  estimate  of  the  95 %CI  for  0  can 
be  obtained  if  the  output  pdf  F  is  a  simple  known  function,  describing  the  stochastic 
variability  of  the  inter-comparison  data.  In  other  cases  a  suitable  approximation  of  the 
expected  value  Ep[X]  =  f  xdF(x)  could  be  accepted  to  output  the  reference  value  xr. 

The  inter-comparison  data  structure  is  summarised  here  in  terms  of  interval  estimates: 

INPUT  Sample  —  Each  one  of  the  N  participants  originates  a  95 %CI  that  is  one 
element  of  the  inter-comparison  sample:  / 

{ \u/ii j  'Uiu] i  i  I?  •  •  •  5  N} . 


(2.1) 
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Here  no  value  Xi  in  the  interval  [uu,  is  chosen  as  representative;  possible  information 
on  Fi  (such  as  limited  or  unlimited  support,  symmetric  or  not)  should  be  added.  If  a 
laboratory  does  not  supply  any  information  on  the  pdf,  the  uniform  distribution  is 
assumed. 

Comparison  OUTPUT  —  It  includes  the  representative  value  and  its  95 %CI 

(f  M).  (2-2) 

In  many  inter-comparisons,  the  differences  to  9  are  also  defined:  (y*  ,  [wif ,  Wm]),  where 
y{  =Xi-  9J  =  1,...  ,N. 

3  A  classical  approach  to  inter-comparisons 

Let  us  recall  the  solution  to  the  inter- comparison  problem  through  the  traditional  estim¬ 
ator,  the  weighted  mean.  It  is  a  location  statistic  that  combines  several  measures  and 
their  standard  uncertainties  (a?*,!**)^.  It  provides  the  following  estimate  for  9 , 

0™ = uwYl^'  >  (3-1) 

and  the  following  symmetric  95%C7, 

0W  ±  kuw ,  (3.2) 

where  the  coverage  factor  k  is  taken  as  the  value  tjv- 1,0.95  of  the  Student  distribution,  N 
being  small.  In  this  approach,  each  is  viewed  as  an  unbiased  estimate  of  the  laboratory 
mean  value  and  the  random  variable  9W  is  defined  to  be  a  linear  combination  of  N  inde¬ 
pendent  random  variables  . . .  ,X/v,  where  {rri , . . .  is  an  observed  sample.  0W  is 
supposed  to  be  asymptotically  normally  distributed  [6].  This  estimator  can  be  correctly 
adopted  to  solve  an  inter-comparison  problem  if  the  assumption  of  the  homogeneity  of 
the  data  is  valid.  This  is  equivalent  to  saying  that,  after  considering  the  extent  of  the 
real  effect  and  bias  in  each  laboratory,  the  laboratories  yield  on  the  average  the  same 
value,  so  that  the  differences  between  the  estimates  are  entirely  due  to  random  error. 
In  this  case,  the  selected  estimator  9W  appropriately  estimates  9  and  (3.2)  accurately 
estimates  its  95 %CL 

Obstacles  to  applying  this  approach  to  a  key-comparison  have  been  discussed  in  [3]. 
The  “credibility”  of  the  representative  values  Xi,  and  of  their  uncertainty  can  critically 
affect  the  accuracy  of  the  estimate  of  the  representative  value  xr.  Moreover,  the  peculiar 
characteristics  of  a  typical  inter-comparison  sample  ((1)  its  very  limited  size,  from  a 
statistical  point  of  view,  (2)  different  experimental  methods,  used  in  each  laboratory) 
often  imply  that  the  statistical  assumptions  are  not  satisfied,  as  for  example  in  several 
thermometric  cases.  Indeed,  the  first  characteristic  implies  that  the  Central  Limit  The¬ 
orem  and  the  asymptotic  theory  do  not  hold.  Then  the  normal  distribution  cannot  be 
properly  used  to  infer  the  estimates  in  (3.2). 

Another  example  of  the  inadequacy  of  the  weighted  mean  approach  is  when  some 
laboratories  provide  data  affected  by  bias,  resulting  from  skewed  distributions  underlying 
their  measurements.  The  symmetric  confidence  interval  of  (3.2)  cannot  be  considered  an 
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accurate  approximation1  of  the  true  one,  since  it  does  not  adjust  for  the  skewness.  Finally, 
it  is  necessary  to  point  out  that  the  homogeneity  condition  among  the  laboratories  must 
be  assured  in  some  sense,  otherwise  it  would  be  impossible  to  attempt  to  the  computation 
of  any  summary  estimate  and  its  associated  uncertainty. 

4  The  approach  based  on  interval  data 

4.1  The  mixture  density  function 

This  paper  proposes  to  construct  a  simple  model  for  the  output  pdf,  and  to  estimate 
its  expected  value  0  without  requiring  strong  assumptions  such  as  N  large  or  each  Fi 
normal  This  approach  enables  us  to  compute  the  probability  interval  of  the  output 
value  in  terms  of  the  identified  density  in  each  laboratory.  The  stochastic  variability  of 
the  population  of  inter-comparison  data  is  directly  considered  in  the  modelling  approach 
as  a  whole,  by  means  of  a  so-called  mixture  distribution  model  [5].  This  model,  being 
a  linear  superposition  of  several  (say  N)  component  densities,  appears  to  be  suitable 
from  a  computational  point  of  view  and  can  be  embedded  in  a  bootstrap  algorithm  to 
simulate  several  data  needed  to  predict  the  output  quantities. 

In  an  inter-comparison,  let  us  suppose  that  a  density  function  A^)  is  assumed 
for  the  z-th  laboratory,  then  the  following  density  mixture  is  identified  to  model  the 
output  pdf,  where  the  parameter  vector  is  A  =  (A^\  . . .  ,  A^N))  and  given  weights  1 1*  > 

0,z  =  1, . . .  ,  AT,  have  summation  normalised  to  one: 

N 

=  (4.1) 

i= 1 

To  compute  the  output  as  estimate  of  the  expected  value  of  the  mixture,  0  —  EG( a)[AT], 
the  probability  function  G( A),  corresponding  to  the  density  in  (4.1),  must  be  known. 
When  some  laboratory  provides  only  partial  information  on  a  pdf,  we  propose  to  identify 
its  experimental  variability  by  one  of  the  following  simple  probabilistic  models:  uniform, 
normal  or  triangular  pdf  (right  or  left  or  symmetric  triangular).  Indeed,  in  thermometric 
experiments  these  three  probabilistic  models  can  represent  several  common  stochastic 
variabilities  for  measurements,  such  as  a  limited  or  unlimited  support,  symmetric  or  not. 

We  want  the  mixture  parameters  to  be  estimated  by  means  of  the  INPUT  Sample, 

(2.1),  as  required  in  a  bootstrap  approach.  Let  us  call  I*  the  probability  interval  to  which 
the  100%  measurements  of  the  laboratory  are  supposed  to  pertain.  For  the  uniform  and 
the  triangular  types,  A^  parameters  are  defined  to  be  the  extremes  of  I*  =  [Xu,  Aju].  For 
the  normal  model  the  parameters  are  the  mean  Xi  and  the  variance  Ui,  while  Ii  becomes 
(-oo,  -boo). 

A  right  triangular  pdf  (RT),  a  left  triangular  pdf  (LT)  or  symmetric  triangular  pdf 
(ST)  is  chosen  according  to  the  position  where  the  maximum  of  the  probability  density 
occurs,  i.e.,  one  extreme  or  the  middle  point  of  I. 

1 A  95%  Cl  [q, en]  for  9  is  defined  to  be  accurate  if  the  following  holds  for  every  possible  value  for  9:  Probe {0 
=  0.025  and  ProbG{0  <  e*}  =  0.025 
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To  compute  the  two  components  of  the  vector  A^  =  (A*/,  A iu)T  given  the  7-th  input 
interval,  a  0.025%  portion  of  probability  mass  is  added  outside  of  each  extreme,  according 
to  the  supplied  density  shape.  For  example,  if  the  ST  density  is  chosen,  the  parameters 
are  computed  by: 

A^  =  (0.89w?;/  “  0.11iqw)/0.78  A*„  =  (0.89t/*w  —  0.11Wj/)/0.78. 

The  mixture  weights  could  be  used  to  associate  a  degree  of  “credibility”  to  each 
laboratory.  Then  the  choice  7 r*  =  1/7V,  i  —  1, . . .  ,7V,  implies  that  every  laboratory  equally 
contributes  to  the  inter-comparison. 

When  the  mixture  (7(A)  is  completely  identified,  it  can  be  used  to  simulate  data  and 
to  approximate  the  output  value  in  the  Monte  Carlo  algorithm. 

4.2  The  bootstrap  algorithm 

To  avoid  integral  computations  to  estimate  9  and  its  variance,  the  Monte  Carlo  method 
is  commonly  used  to  approximate  them  within  a  given  precision.  Since  the  parametric 
bootstrap  approach  does  resampling  from  a  parametric  distribution  model,  in  this  case 
the  mixture  model  (7(A),  is  adopted  to  approximate  the  following  distribution, 

H(x)  ^Pwb6{0*  <  x}.  (4.2) 

The  Monte  Carlo  method  simulates  a  sufficiently  high  number  B  of  data  9*  from  <7  = 
(7(A),  to  compute, 

ff(*)(B,  =  4£nws*}.  ■  (4-3) 

:  6=1 

where  the  function  TI{  A}  is  the  indicator  function  of  the  set  A.  With  probability  one,  it  is 
known  that  the  Monte  Carlo  approximation  converges  to  the  true  value  as  B  — >  00.  The 
Monte  Carlo  algorithm  has  been  developed  for  a  mixture  density  to  estimate  the  com¬ 
parison  output.  A  hierarchical  resampling  strategy  is  used  to  reproduce  the  hierarchical 
variability  in  the  inter- comparison  population,  throughout  the  following  steps: 

(1)  (a)  Choose  at  random  an  index,  say  fc,  of  k- th  laboratory  by  randomly  resampling 

with  replacement  from  the  set  {1, . . .  ,7V} 

K  -  Prob{K  =  fc}  =  7T*. 

(b)  Given  h ,  generate,  at  random  from  the  selected  JF*  of  the  distribution,  a  boot¬ 
strap  value  9*  in  [A^,  AfcU]. 

Repeat  Step  1  B  times  to  simulate  the  full  bootstrap  sample  $1, . . 

(2)  Approximate  the  bootstrap  mixture  distribution  as  in  (4.3)  to  compute: 

—  the  bootstrap  estimate  of  the  expected  mean 

6=1 


(4.4) 
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Labi  (-0.05;  0.15)  [-0.347,  0.247] 

Lab2  (0.03;  0.30)  [-0.564,  0.624] 

Lab3  (  0.18;  0.15)  [-0.117,  0.477] 

.Lab4  (0.04;  0.15)  [-0.257,  0.337] 

Lab5  (  0.71;  0.15)  [  0.413,  1.007] 

Lab6  (-0.01;  0.15)  [-0.307,  0.2871 

Lab7  (-0.03;  0.15)  [-0.327,  0.267] 

Tab.  1.  Inter-comparison  of  7  laboratories  [7]:  point  estimates  and  simulated  in¬ 
terval  data. 
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—  the  bootstrap  standard  deviation:  Sd*B  =  (0£  -  0^)2J  , 

the  95 %CI  [e* ,  e*],  where  the  two  extremes  are  computed  as  the  a-th  quantile 
2  (a  =  0.025)  of  the  bootstrap  distribution  tf§00t(a))_1  =  q*Ba,  hence  ef  =  q%* 
and  e*u  =  qf-«\  . . . 

In  Step  lb)  the  inverse  transformation  method  has  been  used  for  simulating  a  ran¬ 
dom  variable  X  having  a  continuous  distribution  Fk.  For  example,  X  =  for  a 

A ku )  random  variable.  In  Step  2  the  bootstrap  Cl  has  been  computed  by  means  of 
the  percentile  method  (see  footnote).  However,  when  the  normal  distribution  is  involved 
in  the  mixture,  the  t- bootstrap  method  gives  more  appropriate  results  [4].  To  determine 
B  in  approximating  the  bootstrap  confidence  interval  the  coefficient  of  variation  [4]  can 
be  used.  The  value  of  B  is  increased  until  the  coefficient  of  variation  cv  of  the  sample 
quantile  approaches  the  given  precision  S0.  Indeed,  from  a  metrological  point  of  view,  it 
appears  easier  to  choose  5o  instead  of  B  as  stopping  rule  in  Step  1. 

We  would  like  to  have  also  an  automatic  tool  to  investigate  how  well  every  laboratory 
contributes  to  the  comparison,  or  to  detect  the  possible  presence  of  heterogeneous  data. 
Here  the  concept  of  jackknife- after-bootstrap  has  been  adopted  to  compute  the  mean 
and  the  bootstrap  95 %CI.  It  is  simply  obtained  by  the  following  algorithm: 

for  i  =  1, ...,  IV,  leave  out  the  a-th  lab  and  compute  0B(—i)  and  q*B(— a), 
compare  the  N  jackknife  estimates  to  detect  outlier  values. 

5  An  application  in  thermometry 

The  proposed  method  is  shown  applied  to  an  inter-comparison  of  Temperature  Fixed 
Points,  involving  N  =7  laboratories  [7].  Each  lab  provided  data  with  the  95%  standard 
uncertainty  (Table  1:  first  item).  - 

The  second  item  (square  brackets  in  the  same  table)  represent  the  interval  data 
generated  with  (3.2),  that  used  to  perform  this  simulated  example.  Since  no  specific 
pdf  was  supplied,  the  mixture  distribution  density  has  been  constructed  assuming  the 
uniform  type  for  each  participant  and  equal  weights.  The  parameters  of  every  uniform 
density  was  computed  using  interval  data,  and  the  obtained  mixture  density  was  used 
in  the  resampling  step  of  the  algorithm  to  compute  the  representative  value  and  its 


2The  percentile  method  of  a  statistics  0 ,  based  on  B  bootstrap  samples,  simply  gives  for  a  a-percentile 
{(ai?)th  largest  for  $£} 
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Mixture  of  7  Uniform  densities 


Mixture  of  6  Uniform  and  1  RT  densities 


Fig.  1.  Bootstrap  histograms  B  =2209:  left-mixture  of  7  uniform  distributions;  right- 
mixture  of  6  ST  plus  one  RT  density  for  Labi 

probability  interval  with  <5o  =  0.05.  In  Figure  1  (left)  the  bootstrap  histogram,  that 
approximates  the  mixture  density,  shows  a  bimodal  behaviour.  The  computations  are 
obtained  for  60  =  0.05  or  B  =  2209:  0*  =  0.14,  bootstrap  standard  deviation  ScT  =  0.33, 
95%CI  [-0.35,  0.92]. 

The  proposed  algorithm  was  also  applied  with  a  mixture  of  seven  normal  densities, 
and  the  results  are  0*  =  0.13,  Sd*  =  0.43,  bootstrap  95%CJ  [-0.61,  1.1]  for  B  =4752.  The 
effect  of  assuming  unlimited  symmetric  distributions  to  model  the  output  pdf  results  in 
a  wider  95%CJ  for  a  mixture  of  normal  densities. 

By  comparing  the  jackknife  results  in  Table  2,  Lab5  appears  to  supply  unusual  values. 
To  directly  consider  this  behaviour  in  the  inter- comparison,  a  mixture  of  six  uniform 
densities  plus  a  RT  density,  identifying  Lab5,  has  been  constructed.  The  approximated 
bootstrap  distribution  is  displayed  in  Fig.l  (left),  with  bootstrap  estimates,  6*  =  0.15, 
standard  deviation  Sd*  =  0.35  and  [-0.35,  0.96]  for  the  Bootstrap  95 %CI,  obtained  for 
B  =  2209. 

6  Conclusions 

The  problem  of  the  inter-comparison  data  has  been  described,  and  a  new  approach  has 
been  proposed.  It  is  based  on  the  uncertainty  estimates,  that  should  be  provided  by  each 
Laboratory  as  interval  estimate  at  95%  confidence  level  together  with  information,  also 
partial,  on  the  probability  function.  The  constructive  procedure  directly  characterises 
the  stochastic  variability  of  the  reference  value  of  the  inter-comparison,  by  means  of  a 
mixture  density  model.  The  result  of  an  inter- comparison  is  then  viewed  as  a  random 
variable,  not  directly  measured,  being  the  output  of  a  complex  process,  that  involves 
measures,  statistical  information  and  metrological  considerations.  These  considerations 
suggest  us  constructing  a  mixture,  with  weights  iii  to  take  into  account  each  participating 
laboratory  according  to  its  credibility. 
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Labi 

0.34 

[-0.45,  0.92] 

Lab2 

0.32 

[-0.31,  0.94] 

Lab3 

0.34 

[-0.40,  0.91] 

Lab4 

0.34 

[-0.35,  0.92] 

Lab5 

0.23 

[-0.42,  0.48] 

Lab6 

0.34 

[-0.36,  0.95] 

Lab7 

0.34 

[-0.42,  0.92] 

Tab.  2.  Jackknife- after-boot  strap  estimates.  Standard  deviation  and  95  %CI  for 

mixture  of  6  uniform  densities  ( B  ~  1000):  in  the  zth  item,  Labi  is  left  out. 

The  parametric  bootstrap  approach  has  been  adopted  to  estimate  in  a  simple  and 
automatic  way  the  inter-comparison  output,  where  information,  even  partial,  on  the 
probability  hierarchical  data  of  the  participating  laboratories,  have  been  taken  into  ac¬ 
count. 

Also  with  a  limited  number  of  laboratories,  the  method  can  be  applied,  as  it  is  shown 
in  the  thermal  example,  where  (N  =  7)  and  the  experimental  conditions  implied  to  adopt 
skewed  distributions.  The  automatic  jackknife  method  of  detecting  the  heterogeneous 
data  succeeded  in  revealing  an  unusual  value.  To  take  into  account  this  condition,  a 
mixture  of  six  uniform  densities  plus  an  RT  density  to  identify  Lab5  could  be  better  used. 
The  choice  of  equal  weights  emphasises  that  all  the  standards  have  equally  contributed 
to  the  inter-comparison. 

The  bootstrap  procedure,  completely  developed  for  a  class  of  five  simple  distribution 
functions  often  used  in  thermal  metrology,  could  be  adapted  to  consider  other  distribu¬ 
tions,  when  the  synthetic  data  information  provided  by  the  laboratories,  as  summarised 
in  Section  2,  allow  to  compute  the  mixture  parameters. 
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Abstract 

Self-calibration  techniques  have  been  used  extensively  in  co-ordinate  metrology.  At  their 
most  developed,  they  are  able  to  extract  all  systematic  error  behaviour  associated  with 
the  measuring  instrument  as  well  as  determining  the  geometry  of  the  artefact  being 
measured.  However,  this  is  generally  at  the  expense  of  introducing  extra  parameters 
leading  to  moderately  large  observation  matrices.  Fortunately,  these  matrices  tend  to 
have  sparse,  block  structure  in  which  the  nonzero  elements  are  confined  to  much  smaller 
submatrices.  This  structure  can  be  exploited  either  in  direct  approaches  in  which  QR 
factorisations  are  performed  or  in  iterative  algorithms  which  depend  on  matrix- vector 
multiplications.  In  this  paper,  we  describe  self- calibration  approaches  associated  with  high 
accuracy,  dimensional  assessment  by  co-ordinate  measuring  systems,  highlighting  how  the 
associated  optimisation  problems  can  be  presented  compactly  and  solved  efficiently.  The 
self-calibration  techniques  lead  to  uncertainties  significantly  smaller  than  can  be  expected 
from  standard  methods. 

1  Introduction 

An  important  activity  in  metrology  is  the  calibration  of  instruments  and  artefacts.  Cal¬ 
ibration  defines  a  rule  which  converts  the  values  output  by  the  instrument’s  sensor (s) 
to  values  that  can  be  related  to  the  appropriate  standard  (SI  or  derived)  units.  Import¬ 
antly,  to  these  calibrated  values  it  is  required  to  assign  uncertainties  that  reliably  take 
into  account  the  uncertainties  of  all  quantities  that  have  an  influence.  As  a  consequence, 
the  size  and  complexity  of  the  computational  tasks  associated  with  the  data  analysis  can 
be  significant,  even  for  instruments  that  appear  to  be  of  simple  design  and  operation. 
It  is  thus  beneficial  to  design  and  implement  algorithms  that  are  efficient  with  respect 
to  computation  and  memory.  Fortunately,  many  of  the  calibration  problems  give  rise  to 
systems  of  equations  with  a  well  defined  sparsity  structure. 

The  rest  of  this  paper  is  organised  as  follows.  In  Section  2  we  review  least  squares 
approaches  to  calibration  problems  and  go  on  to  describe  self-calibration  problems  in 
co-ordinate  metrology  in  Section  3.  Sections  4  and  5  describe  solution  methods  for  two 
types  of  sparsity  structure.  Our  concluding  remarks  are  given  in  Section  6. 

2  Least  squares  solution  to  calibration  problems 

In  many  calibration  problems,  the  observation  equations  involving  measurements  yi 
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can  be  expressed  as  yi  =  fa(a)  +  e*,  where  fa  is  a  function  depending  on  parameters 
a  =  (<ii, . . . ,  an)T  specifying  the  behaviour  of  the  instrument,  and  e*  represents  random 
measurement  error.  For  a  set  of  measurement  data  {y*}T\  best  estimates  a*  of  the 
calibration  parameters  a  are  determined  by  solving 

m 

mm  ^/?(a)  =  ftf,  (2.1) 

i= 1 

where  /*( a)  =  yi  —  fa(&)-  The  most  common  approach  to  solving  this  problem  is  derived 
from  the  Gauss-Newton  algorithm;  see,  for  example,  [5].  If  a  is  an  estimate  of  the  solution 
and  J  is  the  Jacobian  matrix  defined  at  a  by  =  dfijdaj ,  then  an  updated  estimate 
of  the  solution  is  a  +  p,  where  p  solves  the  Jacobian  system 

Jp  =  -f, 

in  the  least  squares  sense.  Starting  with  an  appropriate  initial  estimate  of  a,  these  steps 
are  repeated  until  convergence  criteria  are  met. 

A  numerically  stable  method  of  solving  the  Jacobian  system  is  to  find  a  factorisation 
J  =  QR,  where  Q  is  an  m  x  n  orthogonal  matrix  and  R  is  an  upper-triangular  matrix 
of  order  n  (see,  e.g.,  [1,  6]).  The  solution  p  is  determined  efficiently  by  solving  the 
upper-triangular  system 

Rp  =  -QTf, 

using  back  substitution.  The  matrix  Q  can  be  constructed  using  either  Householder 
reflections,  which  process  the  Jacobian  matrix  a  column  at  a  time,  or  Givens  plane 
rotations,  which  process  the  matrix  row-wise.  For  either  approach  the  orthogonal  fac¬ 
torisation  requires  0(mn2)  operations. 

An  alternative  to  the  direct  approaches  to  solve  matrix  equations  is  to  use  iterative 
procedures  based  on  conjugate  gradients.  The  advantage  of  these  approaches  is  that  they 
involve  only  matrix-vector  multiplications  and  for  sparse  matrices  these  multiplications 
can  be  made  efficient.  In  particular,  the  LSQR  algorithm  of  Paige  and  Saunders  [7] 
implements  an  iterative  approach  to  solving  linear  least  squares  problems. 

Often,  linear  equality  constraints  on  the  parameters  of  the  form  CTa  =  c,  where  C 
is  an  n  x  p  matrix,  p  <  n,  are  required  to  eliminate  degrees  of  freedom  in  the  problem. 
However,  we  can  use  orthogonal  projections  to  eliminate  these  constraints.  Suppose  C 
is  of  full  column  rank  and  has  QR  factorisation 

c=[v L  Vi] 

where  V\  and  V2,  respectively,  are  the  first  p  and  last  n  —  p  columns  of  the  orthogonal 
factor  V.  If  ao  is  a  solution  of  CTa  =  c,  then  for  any  (n  —  p) -vector  a,  a  =  a0  -F 
automatically  satisfies  the  constraints  and  the  optimisation  problem  can  be  reformulated 
as  the  unconstrained  non-linear  least  squares  problem 

m 

m~inX^(a°  +  ^2*)’ 


s 

0  ’ 
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involving  the  reduced  set  of  parameters  a.  We  note  that  the  associated  Jacobian  matrix 
is  simply  J  —  JV 2,  where  Jij  =  dfi/daj ,  as  before. 

Unfortunately,  even  if  J  has  structure  J  =  J\ 2  could  be  full.  For  indirect  approaches, 
this  is  of  little  consequence  since  the  matrix- vector  multiplications  can  be  formed  in  two 
stages  (e.g.,  y  =  V2X,  z  =  Jy)  each  of  which  can  be  implemented  efficiently.  For  a  direct 
approach,  it  may  be  possible  to  implement  the  constraints  in  such  a  way  as  to  minimise 
the  amount  of  fill-in  during  the  orthogonal  factorisation  stage. 

3  Self-calibration  problems  in  co-ordinate  metrology 

Co-ordinate  metrology  is  concerned  with  defining  the  geometry  of  two  and  three  dimen¬ 
sional  artefacts  from  measurements  of  the  co-ordinates  of  points  related  to  the  surface 
of  the  artefacts.  It  is  a  key  discipline  in  quality  and  process  control  in  manufacturing 
industy.  In  a  (conventional)  co-ordinate  measuring  machine  (CMM)  with  three  mutu¬ 
ally  orthogonal  linear  axes,  the  position  of  the  probe  tip  centre  is  inferred  from  scale 
readings  on  each  of  the  three  machine  axes.  In  practice,  CMMs  have  imperfect  geometry 
with  respect  to  the  straightness  of  the  axes,  the  squareness  of  pairs  of  axes  and  rotations 
describing  roll,  pitch  and  yaw,  and  these  systematic  errors  have  to  be  taken  into  account 
if  the  accuracy  potential  of  the  CMM  is  to  be  more  fully  realised.  Two  approaches  can 
be  adopted  to  nullify  the  effect  of  these  systematic  errors.  The  first  -  error  mapping  ~ 
involves  performing  a  set  of  experiments  to  characterise  as  completely  as  possible  the 
error  behaviour  of  the  instrument  and  then  use  error  correction  software  to  produce  more 
accurate  co-ordinate  estimates.  The  disadvantages  of  this  approach  are,  firstly,  the  set 
of  experiments  is  expensive  to  perform  and,  secondly  and  more  importantly,  the  error 
behaviour  of  the  CMM  is  likely  to  drift  so  that,  for  example,  an  error  correction  valid  on 
Monday  will  only  be  partially  valid  on  Friday  and  may  be  of  limited  value  a  month  later. 
The  second  approach  -  self- calibration  -  attempts  to  use  any  approximate  symmetries, 
rotational  or  translational,  of  the  artefact  so  that  systematic  errors  associated  with  the 
measuring  system  are  identified  as  part  of  the  measurement  process  [4] .  The  advantage 
of  this  method  is  that  the  effect  of  systematic  error  behaviour  of  the  instrument  is  can¬ 
celled  out  arid  the  accuracy  of  the  measurements  are  limited  only  by  the  smaller,  random 
component. 

3.1  Calibration  of  reference  artefacts  in  2-dimensions 

As  an  example,  we  consider  the  accurate  calibration  of  2-dimensional  artefacts  by  a  two 
dimensional  CMM.  The  artefacts  define  the  location  of  targets  nominally  aligned  in  a 
grid  pattern.  Let  y j,  j  =  1  ,...,ny,  be  the  locations  of  the  targets  in  a  fixed  frame  of 
reference,  and  let 

yj,k=T{yj,tk) 

be  the  location  of  the  jth  target  in  the  kth  measuring  position.  Here,  the  roto-translation 
T  is  specified  by  three  parameters  t  defining  the  translation  vector  and  angle  of  rotation. 

We  suppose  the  systematic  error  of  the  two  dimensional  CMM  can  be  expressed  as 

x*  =  x*(x,  b)  =  x  +  e(x,  b), 
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where  x*  are  the  true  point  co-ordinates,  x  are  the  indicated  point  co-ordinates  output  by 
the  machine  and  e(x,  b)  is  the  error  correction  term  depending  on  x  and  error  parameters 
b.  For  instance,  suppose  the  model  describes  scale  and  orthogonality  errors  so  that 

x*  =x(l  +  bi)  +  y(l  +  b2)smb3 ,  y*  =  y(l  +  b2)  cos 63. 

If  x;  is  the  measurement  of  the  jth  target  with  the  artefact  in  the  kth  position  then  the 
associated  observation  equation  is 


Xi  +  e(x,,  b)  =  yj:k  +  (3.1) 

Given  a  set  of  such  measurements  {xt}™x  and  associated  index  functions  ( j(i),k(i )) 
specifying  the  targets  and  artefact  positions,  estimates  of  the  model  parameters  can  be 
determined  by  solving  a  non-linear  least  squares  problem 


mx 

min  YfiU 


where  tfe(i),b)  =  x*  +  e(xi,b)  -  yjik- 

The  model  involves  three  sets  of  the  parameters:  the  target  locations  {y^},  transform¬ 
ation  parameters  {t^}  and  the  error  parameters  b.  Each  observation  equation  depends 
on  only  one  target  and  one  transformation,  so  that  the  Jacobian  matrix  J  of  partial 
derivatives  can  be  ordered  to  have  a  block- angular  structure  [2] 


where  Kj  corresponds  to  the  parameters  yj  and  the  border  blocks  {Jj}  correspond  to 
the  border  parameters  a  =  {{t/J,  b}.  The  frame  of  reference  for  the  targets  {y^}  can  be 
specified  by  applying  three  appropriate  linear  equality  constraints  on  the  transformation 
parameters  {t*,}. 

While  scale  and  orthogonality  errors  are  often  major  contributors  to  the  systematic 
error  behaviour  of  a  CMM,  there  is  no  guarantee  nor  does’ experience  show  that  they 
explain  the  full  extent  of  the  behaviour.  For  this  reason,  more  comprehensive  models  have 
been  developed  [3,  9].  However,  they  all  depend  on  the  approximation  of  actual  behaviour 
by  empirical  functions  such  as  polynomials  and  the  adequacy  of  the  approximation  is 
often  difficult  and  expensive  to  evaluate.  However,  if  we  always  rotate  and  translate  the 
artefact  according  to  the  symmetries  of  the  reference  artefact  so  that  the  targets  are 
always  located  (nominally)  at  a  subset  of  a  fixed  grid  of  points  in  the  CMM’s  working 
volume,  then  measurements  are  made  at  a  finite  number  of  machine  locations.  To  the 
Zth  location  we  associate  a  machine  error  e* .  If  the  ith  measurement  is  made  at  the  Ith 
location  then  the  observation  equation  corresponding  to  (3.1)  is 


x;  +  e*  =y j,fc  +  €i. 

The  advantage  of  this  error  model  is  that  it  entails  no  significant  approximation:  the 
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n  z  =  1784 

Fig.  1.  Sparsity  structure  of  the  transpose  of  the  Jacobian  matrix  associated  with  the 
measurement  of  a  5  x  5  hole  plate  in  eight  positions. 

systematic  errors  are  modelled  exactly.  An  apparent  disadvantage  is  that  there  are  likely 
to  be  as  many  error  parameters  as  target  parameters  giving  rise  to  a  sparsity  structure 
in  the  Jacobian  matrix  for  which  direct,  structure-exploiting  methods  provide  relatively 
minor  efficiency  gains.  Figure  1  shows  on  the  left  the  sparsity  structure  of  the  Jacobian 
matrix  J  associated  with  the  measurement  of  a  5  x  5  hole  plate  in  eight  positions,  the 
first  four  corresponding  to  rotations  by  0,  90,  180  and  270  degrees,  the  second  four 
incorporating  a  translation  as  well  as  a  rotation.  In  each  position  the  location  of  the 
targets  y j  are  measured  in  order.  The  nonzero  elements  of  the  matrix  are  represented 
by  a  dot.  The  first  (second)  50  columns  correspond  to  the  derivatives  with  respect  to 
the  machine  error  parameters  e/  (target  parameters  y^)  and  the  last  24  correspond  to 
the  eight  sets  of  transformation  parameters  tfc.  On  the  right  the  sparsity  structure  of  the 
triangular  factor  of  J  is  illustrated  and  shows  the  substantial  fill-in  that  occurs. 

In  the  next  two  sections,  we  describe  approaches  for  dealing  efficiently  with  block- 
angular  and  more  general  sparse-block  structure. 

4  Algorithms  for  block-angular  systems 

We  consider  non-linear  least  squares  problems  where  the  optimisation  parameters  can 
be  partitioned  into  two  sets  r]  =  {yj}™ y  and  a,  and  such  that  each  observation  equation 
involves  a  and  at  most  one  set  of  parameters  yj.  Corresponding  to  (2.1),  we  have  instead 
an  objective  function  of  the  form 

F{rt,s)  =  fj  (a)fo(a)  +  ^f/fo.ajf^y^a). 
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The  associated  Jacobian  matrix  J  and  its  triangular  factor  R  can  be  arranged  to  have 
the  form 


'  Kx 

Ji 

'  Ri 

Bi  ' 

J  = 

k2 

h 

,  R  = 

R2 

b2 

Kny  Jny 

Jo 

Rny  Bjiy 
Bo 

The  nonzero  blocks  of  the  matrix  R  can  be  stored  compactly  in  a  vector  r,  row  by  row. 

Efficient  updating  strategies  for  such  triangular  factors  have  been  incorporated  into 
a  non-linear  least-squares  solver  to  deal  with  block- angular  problems.  It  is  assumed  that 
the  Jacobian  matrix  is  composed  of  ub  blocks  of  rows,  with  the  zth  block  depending  on 
at  most  one  set  of  parameters  yy,  j  =  j(i).  The  user  is  required  to  supply  a  function  and 
gradient  evaluation  module  that  given  77,  a  and  1  <  i  <  ng,  returns  j  =  j(i)  and 

fi(a),  Ji,  j  =  0, 

fi(yj,a),  Ji,  Kl,  j>  0. 

For  each  z,  the  triangular  factor  and  righthand  side  vector  is  updated  by  the  zth  block 
of  rows: 


Rj(i) 

Bj  (i) 

1 — > 

Rj(i)  Bj(i) 

Ro 

1 — > 

Ro 

Ki 

Ji  J 

0  Ji 

.  . 
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Linear  equality  constraints  on  the  border  parameters  a  implemented  using  the  orthogonal 
projection  approach  can  be  incorporated  by  setting  Ji  :=  2  at  the  appropriate  stage. 

5  Algorithms  for  sparse-block  matrices 

Let  m  x  n  matrix  S  be  composed  of  ub  submatrices  Sk  of  dimension  mk  x  n^.  We 
assume  that  Sk  is  stored  (column- wise  or  row- wise)  as  a  column  vector  s*.  The  inform¬ 
ation  in  S  can  be  encoded  in  a  column  vector  s /  and  an  indexing  set  Is  such  that 
Is(  1  :  5,  k)  =  {ik,jk,mk,nk,lk)  where  (ik,jk)  specifies  the  location  of  £*.(1,1)  in  S  and 
Ik  indicates  that  Sk  =  Si(lk  :  h  +  rnknk  -  1).  Blocks  of  such  matrices  can  be  easily 
represented  by  concatenating  the  s-vectors  and  index  matrices  Is  and  performing  some 
trivial  index  modifications.  Matrix- vector  multiplications  of  the  form  y  :=  a£x  +  /3y  are 
easily  implemented  through  a  sequence  of  full  matrix  multiplications:  y  :=  /?y,  followed 
by 

y(zjfc  :  ik  +  mk  -  1)  :=  y(ik  •  k  +  rnk  -  1)  +  aSkx(jk  :  jk  +  nk  -  1), 

fc  =  l,...,ns.A  similar  scheme  calculates  x  :=  a5Ty +  /?x.  The  storage  and  multiplica¬ 
tion  scheme  can  be  modified  to  take  into  account  the  type  or  structure  of  the  submatrices 

sk: 

To  implement  linear  equality  constraints,  it  is  required  to  perform  matrix  multiplic¬ 
ation  by  a  submatrix  V%  of  the  orthogonal  factor  of  the  constraint  matrix  C.  A  simple 
scheme  can  be  implemented  using  the  LAPACK  routines  DGEQRF  (orthogonal  fac¬ 
torisation)  and  DORMQR  (matrix  multiplication  by  an  orthogonal  matrix  stored  as  a 
product  of  Householder  matrices)  [8] . 
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FlG.  2.  Residual  errors  associated  with  the  first  1000  observations  for  models  a)  with 
no  error  separation  (dots)  and  b)  with  error  separation. 


We  have  implemented  a  non-linear  least  squares  solver  for  sparse-block  systems.  The 
user  is  required  to  supply  a  module  that  takes  as  input  the  current  estimate  a  of  the 
optimisation  parameters  and  outputs  the  function  values  f(a)  and  the  Jacobian  matrix 
stored  in  sparse-block  form  (s /,/$).  The  solver  implements  a  Gauss-Newton  approach 
using  the  LSQR  solver  to  find  the  Gauss-Newton  step  and  caters  in  a  straightforward 
way  for  linear  equality  constraints.  The  solver  has  been  successfully  tested  in  a  number 
of  self-calibration  problems.  For  example,  it  was  used  recently  in  the  calibration  of  a 
13  x  13  grid  of  targets  on  a  glass  plate  by  a  CMM  with  an  optical  probing  system.  The 
problem  involved  approximately  15,000  observation  equations  in  over  800  optimisation 
parameters  and  was  solved  in  a  few  tens  of  seconds  using  a  standard  laboratory  PC  (450 
MHz).  The  advantage  of  the  error  separation  model  is  illustrated  in  Figure  2  which  shows 
the  residual  errors  associated  with  the  first  1000  observations  for  models  a)  with  no  error 
separation  (dots)  and  b)  with  error  separation.  The  fit  for  the  error  separation  model  is 
much  superior.  The  practical  metrological  consequence  of  adopting  the  enhanced  model 
is  that  uncertainties  associated  with  the  target  locations  can  be  reduced  by  a  factor 
of  five.  Importantly,  because  the  model  is  a  realistic  approximation  of  the  measuring 
system,  we  can  have  confidence  in  the  uncertainty  estimates  derived  from  the  model. 

6  Concluding  remarks 

The  move  to  more  accurate  measurement  systems  has  led  to  more  comprehensive  models 
of  the  measuring  instrument  and  its  interaction  with  the  physical  quantity  being  meas¬ 
ured.  These  models  include  parameters  that  describe  properties  of  the  instrument  and 
those  of  the  measurand.  The  aim  of  self-calibration  experiments  is  to  determine  as  much 
as  possible  about  both  sets  of  parameters  from  a  set  of  measurement  experiments.  For 
models  with  a  small  to  modest  set  of  parameters,  a  full  matrix  approach  may  be  accept¬ 
able.  For  larger  systems,  exploitation  of  sparsity  structure  in  the  defining  equations  is 
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highly  desirable  and  often  a  stark  necessity  if  the  computations  are  to  be  made  in  an  ac¬ 
ceptable  time  using  the  computing  resources  to  hand.  The  exploitation  of  block-angular 
structure  has  been  well-known  and  well-used  in  some  areas  of  metrology.  The  supporting 
numerical  technology  based  on  structured  orthogonal  factorisations  is  mature,  compact 
and  easily  implemented  using  standard  numerical  linear  algebra.  However,  these  tech¬ 
niques  could  be  applied  more  widely  in  metrology,  making  feasible  approaches  that  have 
to  be  rejected  if  full  matrix  methods  only  are  to  be  used. 

The  use  of  sparse  matrix  techniques  is  relatively  rare  within  metrology.  We  have 
attempted  to  show  here  that  in  self-calibration  problems  in  dimensional  metrology,  they 
allow  us  to  develop  improved  models  that  provide  vastly  superior  fits  to  the  data,  with 
corresponding  improvements  in  the  evaluated  uncertainties  in  the  fitted  parameters.  The 
supporting  numerical  technology  is  maturing  and  accessible. 
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Abstract 

As  measurement  uncertainties  are  closely  tied  up  with  error  models,  it  might  be  of  in¬ 
terest  to  review  a  model,  which  the  author  assigns  to  “Metrological  Statistics”.  Given 
that  the  random  errors  are  normally  distributed,  the  experimentalist  could  either  refer  to 
B.L.  Welch’s  concept  of  “effective  degrees  of  freedom”  or  to  the  multidimensional  Fisher- 
Wishart  distribution  density.  In  the  first  case,  different  numbers  of  repeated  measure¬ 
ments  are  admissible,  in  the  latter  it  is  strictly  required  to  have  equal  numbers  of  repeated 
measurements.  In  error  propagation,  however,  only  the  latter  mode  of  action  opens  up 
the  possibility  of  designing  confidence  intervals  according  to  Student  and  confidence  el¬ 
lipsoids  according  to  Hotelling.  Another  point  of  view,  closely  linked  to  the  choice  of  the 
numbers  of  repeated  measurements,  refers  to  the  customary  practice  of  attributing  equal 
rights  to  statistical  expectations  and  empirical  estimators.  However,  the  Fisher-Wishart 
distribution  density  suggests  using  only  the  information  which  is  realistically  accessible  to 
experimentalists,  namely  empirical  estimators.  For  the  handling  of  unknown  systematic 
errors,  either  the  existence  of  a  (rectangular)  distribution  density  may  be  assumed  or, 
and  this  is  proposed  here,  they  may  be  classified  as  time-constant  quantities,  biasing  ex¬ 
pectations  and  suspending  a  lot  of  tools  and  procedures  of  error  calculus  well-established 
otherwise. 


1  Introduction 

The  joint  propagation  of  random  errors  and  unknown  systematic  errors  currently  places 
the  experimentalist  in  the  following  dilemma. 

In  regard  to  the  propagation  of  random  errors ,  there  are,  at  least  in  principle,  two 
different  choices.  If  one  is  willing  to  accept  unequal  numbers  of  repeated  measurements 
of  the  physical  quantities  to  be  combined  within  a  given  function,  one  has,  in  order  to 
express  the  influence  of  random  errors,  to  resort  to  B.  L.  Welch’s  sophisticated  concept 
of  so-called  numbers  of  effective  degrees  of  freedom  [8].  However,  this  procedure  is  tied 
up  with  difficulties:  it  is  restricted  to  independent  variables. 

Though  B.  L.  Welch’s  concept  completely  exhausts  the  information  implied  in  meas¬ 
ured  data,  unfortunately,  from  a  metrological  point  of  view,  it  is  cumbersome  to  handle 
and  obstructs  the  view  to  existing  simpler  procedures.  On  the  other  hand,  if  the  ex¬ 
perimentalist  preferred  equal  numbers  of  repeated  measurements,  he  would  —  if  need 
be  —  have  to  give  away  part  of  his  information,  namely  that  which  is  carried  by  the 
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excessive  numbers  of  repeated  measurements  of  the  variables  involved.  Up  to  now,  the 
disregarding  of  excessive  numbers  is  regarded  as  unfavourable.  In  spite  of  this  view, 
just  this  precaution  opens  up  a  toolbox  of  applied  statistics  hitherto  closed  to  metrolo- 
gists,  as  only  with  equal  numbers  of  repeated  measurements,  is  the  experimentalist  in 
a  position  to  call  upon  the  standard  model  of  statistics  for  jointly  normally  distributed 
random  variables,  i.e.  the  Fisher-Wishart  density  [3].  The  advantages  gained  in  that 
way  outweigh  by  far  the  “lost  information” ,  as  relatively  few  repeated  measurements  of 
experimental  set-ups,  operating  in  a  stationary  mode,  are  able  to  locate  accurately  the 
respective  physical  quantities.  After  all,  in  error  propagation  the  experimentalist  may 
define  confidence  intervals  according  to  Student  (Gosset)  including  any  number  of  vari¬ 
ables.  In  least  squares,  he  may  even  establish  multidimensional  confidence  intervals,  and 
last  but  not  least,  certain  problems  of  classical  error  calculus,  such  as  the  Fisher-Behrens 
problems  no  longer  arise. 

In  regard  to  the  interpretation  and  propagation  of  unknown  systematic  errors ,  the 
situation  is  not  simpler.  Let  us  assume  that  an  unknown  systematic  error  /,  constant  in 
time,  is  confined  to  an  interval  of  the  kind1 

-/,</</*,  fs>  0.  (1.1) 

Now,  the  experimentalist  may  either  assign  a  postulated  probabilty  density  to  /,  usually 
a  rectangular  density  [7], 

p(/)  =  ^r,  (1-2) 

or  he  may  set  without  exception 

/  =  constant,  (1.3) 

where  /  lies  anywhere  within  (1.1).  The  latter  interpretation  introduces  biased  estim¬ 
ators,  leading  to  a  break-down  of  many  procedures  of  error  calculus  otherwise  well- 
established.  , 

Seen  mathematically,  both  interpretations  should  be  justified.  In  the  case  of  (1.2), 
the  combination  of  random  and  systematic  errors  should  be  carried  out  geometrically, 
in  the  case  of  (1.3),  arithmetically.  Regarding  (1.3),  the  author  suggests  adding  lin¬ 
early  Student’s  confidence  intervals  to  appropriately  designed  worst-case  estimates  of 
the  propagated  systematic  errors,  and  no  probability  statements  should  be  associated 
with  so-defined  overall  uncertainties. 

2  Error  propagation 

The  fundamental  error  equations  of  Metrological  Statistics  are  given  as  follows  [4].  Let 
xo  designate  the  true  value  of  the  physical  quantity  x  to  be  measured.  Furthermore,  let 
£i  be  the  random  error  and  fx  —  constant  the  unknown  systematic  error  corresponding 


1  Should  the  interval  be  unsymmetrical  to  zero,  it  could  be  symmetrized  by  subtracting  the  halved  sum  of  the 
upper  and  lower  boundary  —  the  same  quantity  would  have  to  be  subtracted  from  the  data. 
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to  (1.1).  We  then  have 


xi  =  Xq  +  Si  +  fX)  l  -  1 ,...  ,n. 


Let  fix  —  x o  +  fx  be  the  expectation  of  the  random  variable  X  =  {xi,X2, ...  ,xn},  so 
that  the  xf  are  some  of  its  realizations.  We  then  find 

Xi  =  fix  +  £/,  /  =  1, . . .  ,n.  (2.2) 

Furthermore,  let  £  =  1/n  xi  denote  the  arithmetic  mean.  We  then  have  the  useful 
identities 

=Xo  +  (xi~  fix)  +  fx,  x  =  x0  +  (x-fix)  +  fx.  (2.3) 

While  the  arithmetic  mean  is  biased,  the  empirical  variance 


■j  (x>  ~ 


is  not.  For  the  time  being,  let  us  consider  just  two  quantities  to  be  measured,  x  and  y. 
As  robust  and  simple  uncertainty  assessments  are  a  matter  of  linearization ,  the  overall 
uncertainty  u<p  of  a  given  function  (f)  (x,y)  is  proposed  to  be  [5], 


ts,p(n-.l)  l{9f\  2,o  (d± 

y/n  V  \  dx )  x  \dx 


(?£\a  +  (°£)2S* 

\dy)  xy  \dy)  * 


+  dJt  f  +dA  f 

+  dx  h'x+  dy  h’v 


where  ts,p  (n  —  1)  is  the  Student-factor  corresponding  to  a  confidence  level  P.  We  dis¬ 
tinctly  see  how  the  empirical  covariance 


1  n 

ttX)' fo-*) 


enters  the  empirical  variance  of  the  <j)(xi,yi );  l  =  1, . . .  ,  n,  given  by 

4  =  ^ E^(*«>w)-^(*.»)]2  =  (i)  -2+2(i)(f)**»+(§f)  sr 

.  The  final  result 

<j){x,y)±u<t>  (2.6) 

is  expected  to  localize  the  true  value  (j)(xo,yo )  with  “reasonable  certainty”  —  but  no 
proper  confidence  statement  should  be  added,  as  u $  is  a  mixture  of  a  statistical  and 
a  non  statistical  component.  The  last  term  in  (2.5)  may  overestimate  the  uncertainty, 
on  the  other  hand  linearization  errors  have  been  negleted.  After  all,  this  uncertainty 
statement  should  fulfill  the  prerequiste  to  be  safe,  robust  and  simple. 

If  there  are  m  quantities  to  be  measured,  we  replace  the  notation  x,  y  by  x\ ,  £2, . . .  ,  xm . 
Then  the  overall  uncertainty  u $  of  the  final  result 
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is  given  by 


When  (2.5)  and  (2.7)  are  compared,  it  becomes  obvious  that  the  proposed  formalism 
of  error  propagation  works  like  a  building  kit,  perspicuous  and  easy  to  handle.  There  are 
arguments  against  (2.7),  in  particular  that  an  experimentalist  who  wishes  to  design  his 
uncertainties  in  this  way,  would  have  to  know  the  complete  set  of  repeated  measurements , 
in  other  words,  the  complete  empirical  variance-covariance  matrix 

S  =  (Sij) ,  i,j  =  1,2,...  ,m,  (2.8) 

of  the  input  data.  Arguably,  this  is  true,  but  in  the  days  of  computers  and  the  internet 
such  a  challenge  should  no  longer  be  apt  to  provoke  difficulties  worth  mentioning.  An¬ 
other  argument,  that  (2.7)  might  overestimate  overall  uncertainties,  should  be  judged  in 
view  of  the  unique  role  of  metrology  in  science.  Standing  “between”  theory  and  experi¬ 
ment,  metrology  pursues  the  idea  to  localize  reliably  the  value  of  the  physical  quantity 
in  question. 

3  Least  squares 

Let 

Aj3*x  (3.1) 

be  a  linear  system  of  equations  to  be  adjusted.  Here,  A  designates  the  m  x  r  design 
matrix  of  rank  r,  (3  the  r  x  1  vector  of  unknowns  and,  finally,  x The  m  x  1  vector  of 
the  observations  or  input  data.  We  assume  m>  r.  The  idea  of  least  squares  is  of  purely 
geometrical  origin. 

In  what  follows,  AT  denotes  the  transpose  of  A.  The  idea  is  to  project  the  vector  x 
by  means  of  a  projection  operator 

P  =  A(AtA)~1At  (3.2) 

orthogonally  onto  the  column  space  of  the  matrix  A,  and  the  result  is 

,3  =  {AtA)~1  Atx.  (3.3) 

As  the  solution  vector  (3  is  linear  in  the  input  data,  the  transfer  of  (2.7)  to  its  components 
fiki  k  =  1, . . .  ,  r,  is  straightforward. 

Clearly,  the  orthogonal  projection  is  in  no  way  dependent  on  the  error  model  implied. 
In  contrast  to  this,  the  latter  turns  out  to  be  crucial  in  regard  to  uncertainty  assessments. 
Let  us  consider  a  set  of  single  observations 

X{  —  *To,i  ~ h  fi  —  X “f"  (Xi  f^i)  “t*  jfj,  i  1,  .  .  .  ,  77i,  (3-4) 

being  the  input  data,  where  E  {Xi}  =  pi.  Writing  (3.4)  in  vector  form,  we  have 

x  =  xq  +  (x  -  p)  4-  / 


(3.5) 
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where 

T  T 

X  —  {X\ ,  •  •  •  i  Xm)  i  (^0,1 5  Xq^i  *  •  *  i  3?o,m)  j 

fl  (/^l  ,  //2i  *  •  *  5  /^m)  5  /  (/l  i  f2i  •  «  *  5  /m)  i  fs,i  5;  /i  —  /s,** 

Given  equal  variances  a2  =  E  |(Xj  -  ju7;)2  j,  the  minimized  sum  Qmin  of  squared 

residuals  of  the  adjusted  system  (3.1)  should  yield,  according  to  quite  familiar  procedures, 
an  estimator  s2  «  a2.  However,  from 

Qmin  =  {x  -  Px)T  (x  -  Px)  , 

we  obtain  something  different,  namely 

E  {Qmin}  -<j 2  (m  -  r)  +  /T/  -  /TP/.  (3.6) 

As  we  see,  even  the  simplest  of  all  associated  least  squares  procedures  breaks  down, 
should  the  model  of  time-constant  unknown  systematic  errors  be  accepted.  At  the  same 
time  the  related  basic  tool  linked  to  Qmin  and  frequently  used,  namely  the  test  of  con¬ 
sistency  of  the  input  data  based  on  the  criterion 

Qmin/s  ~  Tfl  —  T 

breaks  down  as  well.  Indeed,  during  many  decades,  time  and  again,  the  observation 

Qmin/"5  ^  Tfl  T 

has  stunned  experimentalists  [2],  so  that,  in  the  adjustments  of  the  fundamental  physical 
constants,  even  the  abolition  of  least  squares  has  been  considered  [1].  However,  in  view 
of  (3.6),  these  observations  are  understandable. 

After  all,  a  least  squares  adjustment  of  biased  input  data  requires  arithmetic  means 


X%  3^0, i  +  {Xi  pf)  ~b  jfj,  i  1,  .  .  .  ,  771, 

(3.7) 

so  that  the  empirical  variances  and  covariances 

1  n 

Sij  —  ^  ^  ( Xg  Xi)(Xji  3?j),  Sa  — Sj , 

n  “ 1  1 

(3.8) 

are  known  a  priori.  Replacing  (3.5)  by 

X  =  Xq  -b  (x  -  p)  +  / 

(3.9) 

instead  of  (3.3),  we  find 

f3  =  (ATAy1  Atx . 

(3.10) 

A  matter  of  similar  concern  refers  to  the  break-down  of  the  Gauss-Markoff  theorem.  In 
view  of  (3.9),  the  solution  vector  /3  is  biased,  so  that  the  experimentalist  is  no  longer  in 
a  position  to  obtain  a  weight-matrix  from  the  variance- covariance  matrix  of  the  input 
vector  x.  Consequently,  simple,  optimized  adjustments ,  to  which  we  are  customarily 
used,  must  be  ruled  out.  Nevertheless,  we  may  multiply  (3.1)  from  the  left  with  any 
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non- singular  weighting  matrix,  e.g.  with  a  diagonal  one, 

G  =  {#1)  #2?  •  •  *  ?  9m}  ?  9i  =  “  ? 

UXi 


(3.11) 


and  adjust  the  weights  gi  by  trial  and  error  in  order  to  find  the  shortest  possible  uncer¬ 
tainty  intervals.  As  has  been  shown,  this  method  is  also  able  to  detect  inconsistencies 
among  the  input  data,  [6] .  Indeed,  as  a  non-singular  weight-matrix  cannot  shift  the  true 
solution  vector  /?o,  we  are  allowed  to  proceed  this  way. 

To  assign  uncertainties  to  the  components  f3k]k  =  1, . . .  ,r  of  the  solution  vector  /?, 
we  refer  to  (2.7).  To  abbreviate  the  notation,  we  set  in  (3.10) 

B  =  A{AtA)~ 1  (3.12) 

where  the  elements  of  the  matrix  B  will  be  designated  by  Upon  insertion  of  (3.9) 
into  (3.10),  we  arrive  at 

j3  =  Btx o  +  Bt  (x  -  p)  +  BTf .  (3.13) 


Evidently,  (3q  =  BT xq  is  the  true  value  of  the  estimator  (3.  Setting  \Xp  =  E  {j3}  = 
A)  +  £T/,  we  may  define  the  theoretical  variance-covariance  matrix 

which,  however,  remains  numerically  inaccessible.  Consequently,  the  only  thing  we  can 
do  is  to  resort  to  the  empirical  variance-covariance  matrix 

sp  =  (*&&,)  =  BtsB,  k  =  1,2, ...  ,r,  (3.14) 

whose  elements  are  given  by 

m 

SB*.0U  =  ^bikbjk'Sjj,  sfafa=sfa-  (3.15) 

id 

Clearly,  the  Sij  are  the  elements  of  the  empirical  variance-covariance  matrix  s  of  the 
input  data,  as  has  been  stated  in  (2.8)  and  (3.8). 

These  procedures  presuppose,  as  has  been  pointed  out,  equal  numbers  of  repeated 
measurements  within  each  of  the  m  means  (3.7).  The  components  fik  of  the  solution 
vector  may  be  written  as 


1  n  Tn 

fik  =  ~'Yh  @kl  with  Pkl  -YbikXil’  =  1>  •  •  •  .  (3-16) 

n  1=1  <= 1 

Evidently,  the  fai  are  independent  and  normally  distributed.  Let  ppk  denote  the 
expectations 

=  £?{&},  fc  =  l,...,r  (3.17) 

of  the  fa.  Looking  for  just  any  one  of  the  fa, 


fa 


ts,p  (n  ~  1) 
y/n 


'fa 


ts,P  (n  -  1) 


vs 


(3.18) 
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is  a  confidence  interval  according  to  Student,  where  ts.p(n—  1)  is  the  Student-factor. 
This  interval  localizes  p p  with  confidence  P. 

The  components  of  the  third  term  on  the  right-hand  side  of  (3.13)  are  given  by 


hk  =  k  =  l,...,r. 


(3.19) 


Worst-case  estimates  are 


fsA  =  MfsA  k  —  1,...  ,r. 


(3.20) 


After  all,  the  overall  uncertainties  Upk  of  the  components  of  the  solution  vector  /?,  con¬ 
sidered  and  employed  individually,  are  proposed  to  be 


ts,p  (n  - 1)  ,  ,  ;  - 

uh  =  — ^ — sh  +  /.  A  -  k  =  ••••*■■ 


(3.21) 


4  Uncertainty  spaces 

The  component  representation  of  (3.13), 

m  m 

0k  =  00, k  "H  ^  bifr,  (Xj  —  fl{)  +  ^  ^  bikfi  (4*1) 

i-1  i=  1 

reveals  the  couplings  between  the  least  squares  estimators.  Those  due  to  random  errors 
may  be  expressed  by  means  of  Hotelling’s  density  [3].  The  last  term  on  the  right-hand 
side  of  (4.1), 

m 

fpk~Y! hik^'  k  =  l,...  ,r,  (4.2) 


expresses  the  couplings  due  to  systematic  errors.  The  r  components  f@k  map  the  m- 
dimensional  hypercuboid 

fs,i  —  fi  ^  fs.ii  i  =  1,  .  .  .  ,  771,  (4.3) 

onto  the  r-dimensional  space,  yielding  a  convex  polytope.  Both  solids  may  be  combined 
to  an  overall  uncertainty  space,  resembling  a  “convex  potato”.  Figures  1-3  show  the 
confidence  ellipsoid,  the  “security  poly  tope”  and  the  combination  of  both  to  an  overall 
uncertainty  space  for  the  example  of  a  least  squares  adjustment  of  a  circle. 

5  Conclusion 

As  computer  simulations  reveal,  the  approach  presented  here  leads  to  measurement  un¬ 
certainties  safeguarding  physical  objectivity  in  the  sense  that  uncertainty  intervals  re¬ 
liably  locate  the  values  of  the  physical  quantities  in  question.  With  such  a  distinct 
statement,  the  traceability  of  units  and  standards  will  certainly  be  maintained. 
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Abstract 

We  consider  the  fitting  of  geometric  elements,  such  as  lines,  planes,  circles,  cones,  and 
cylinders,  in  such  a  way  that  the  sum  of  distances  or  the  maximal  distance  from  the 
element  to  the  data  points  is  minimized.  We  refer  to  this  kind  of  distance  based  fitting  as 
orthogonal  distance  regression  or  ODR.  We  present  a  separation  of  variables  algorithm 
for  l\  and  loo  ODR  fitting  of  geometric  elements.  The  algorithm  is  iterative  and  allows  the 
element  to  be  given  in  either  implicit  form  f{x,j3)  =  0  or  in  parametric  form  x  =  g(t ,  /?), 
where  (3  is  the  vector  of  shape  parameters,  £  is  a  2-  or  3-vector,  and  s  is  a  vector  of 
location  parameters.  The  algorithm  may  even  be  applied  in  cases,  such  as  with  ellipses, 
in  which  a  closed  form  expression  for  the  distance  is  either  not  available  or  is  difficult  to 
compute.  For  h  and  loo  fitting,  the  norm  of  the  gradient  is  not  available  as  a  stopping 
criterion,  as  it  is  not  continuous.  We  present  a  stopping  criterion  that  handles  both  the 
fa  and  the  loo  case,  and  is  based  on  a  suitable  characterization  of  the  stationary  points. 

1  Introduction 

Let  us  be  given  N  points  {zi}^L x  €  Rd  and  a  geometric  object  S  in 

•  implicit  form  {.t  :  f{x,ff)  =  0}  with  a  scalar  function  /,  or 

•  parametric  form  x  =  g{t,/3)  with  a  vector  function  g , 

where  the  shape  parameter  vector  (3  €  C  lies  within  a  closed,  convex  subset  C  of  Mm. 
Denote  by 

<M/3)  =  inf{||z,:  -  Xi\\2  :  xt  on  S} 
the  distance  of  the  point  Zi  to  the  geometric  object  S.  Let 

be  the  distance  vector  with  norm 

m  =  \\m\\, 

where  ||0(/3)||  denotes  either  the  /oo-norm 

$(/3)  =  max(</>!  <£at(/?)) 
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or  the /i -norm 

m = l>63)- 

i= 1 

We  consider  the  problem: 

Find  (3  e  C  and  points  {xi}^  on  S  such  that  $(/?)  =  || <f>(f3) ||  is  minimal. 

If  the  minimum  is  attained,  each  function  (j)i(f3)  =  \\z{  -  Xi\\2  is  minimal  for  the 
point  x i  €  S.  Then  Z{  —  Xi  is  orthogonal  to  S  for  interior  points  of  S',  hence  the  term 
“orthogonal  distance  regression”  or  “ODR” . 

Nonlinear  l±  ODR  problems  are  treated  in  Watson  [10,  12].  A  survey  for  linear 
problems  is  given  in  Zwick  [13]. 

As  stated,  the  problem  has  dimension  Nd  +  m.  In  typical  metrology  applications,  the 
data  set  is  very  large  so  that  a  direct  approach  to  the  problem  becomes  computationally 
expensive.  We  use  a  separation  of  variables  algorithm  that  was  used  in  [2,  4]  and  Turner 
[9]  for  the  I2  ODR  problem.  Each  iteration  of  our  algorithm  consists  of  two  steps.  In  the 
first  step,  the  foot  points  on  S ,  i.e.,  the  location  parameters,  are  calculated  for 

a  fixed  parameter  vector  f3.  These  d-dimensional  subproblems  can  be  efficiently  handled 
by  trust  region  methods  [3] . 

In  the  second  step,  a  first  order  approximation  of  <f>i(/3)  is  employed,  that  can  be  given 
without  explicit  knowledge  of  the  dependence  of  the  optimal  points  Xi(f3)  on  /3.  At  this 
stage,  the  norm  of  the  correction  to  the  parameter  vector  f3  is  limited  by  a  trust  region 
strategy.  The  correction  can  be  computed  by  solving  a  linear  programming  problem. 
For  general  nonlinear  minimax  problems  such  methods  were  proposed  in  Madsen  and 
SchJjER-Jacobsen  [6],  Hald  and  Madsen  [1]  and  Jonasson  and  K.  Madsen  [5]. 

Our  convergence  analysis  follows  the  general  approach  given  in  Powell  [8]  and  More 
[7].  But  in  order  to  handle  the  l\  and  case  We  cannot  use  the  norm  of  the  gradient 
as  a  stopping  or  convergence  criterion,  since  the  gradient  is  not  continuous.  Moreover,  a 
neccessary  condition  for  a  minimum  is  that  the  subgradient  contains  the  zero  functional, 
see,  e.g.,  Watson  [11].  In  order  to  overcome  this  difficulty,  we  introduce  a  replacement 
for  the  norm  of  the  gradient  that  serves  both  as  a  stopping  criterion  and  as  an  essential 
tool  in  the  convergence  proof. 

2  The  trust  region  algorithm 

At  each  iteration  of  our  algorithm  we  solve  the  low-dimensional  subproblems  (Pi)  for 
(3  =  Pk  for  each  fixed  i,  i  =  1, . . . ,  N: 

Minimize  \\zi  —  Xi\\2  subject  to  f(xi,f3)  =  0  or  x i  —  g(U,(3). 

In  order  to  apply  the  trust  region  method  to  l\  and  l0 0  ODR  we  need  a  first  order 
approximation  a)  to  </>«(/3).  With  appropriate  regularity  assumptions,  this  can  be 
computed  without  knowledge  of  the  dependence  of  the  optimal  points  Xi{(3)  on  f3  ([2], 
[4]).  This  means  that  the  iterative  improvement  in  /?  is  uncoupled  from  the  calculations 
of  #i(/3),  whereby  a  true  first  order  approximation  of  the  objective  function  is  attained. 
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In  the  case  of  the  implicit  form  f(x>/3)  =  0,  the  first  order  approximation  ^(/3  +  a)  = 
ipiiP,  a)  +  o(a)  is  given  by 


a) 


~  Xj)  +  /3)ra 

HV,/(®i,j8)||a 


(2.1) 


as  a  first  order  approximation  to  the  signed  distance  ±<f>i(/3  +  a).  For  the  parametric 
form  x  =  #(t,  ft,  we  have 


=  IN  ~  *i\\2  ~  fe  '*(2-2) 

I! zi  *  xi\\2 

Note  that  (2.1)  makes  sense  even  for  points  on  the  surface.  For  an  orientable  hypersurface 
in  parametric  form,  the  expression  in  (2.2)  should  be  replaced  by  the  unit  normal 

for  points  on  the  surface. 

Denote  by 


the  vector  of  the  linearized  distances  and  let 


®C9,a)  =  11^,0)11-  \\m\\  ■ 

The  main  algorithm: 

•  Step  0:  An  initial  ft  £  a  trust  region  radius  A0  >  0,  and  constants  0  <  fi  <  1 
and  0  <  7  <  1  <  7,  A  are  given.  Set  k  =  0. 

•  Step  1:  Minimize  ^(ft,a)  subject  to  ||a||2  <  A*  and  ft  4-  a  £  C.  Let  OLk  denote 
the  solution  with  minimal  norm. 

•  Step  2:  If  a/e  =  0,  stop. 

•  Step  3:  Compute 

,g(ft  +  qfc)-$(ft) 
pk  #(&,«*) 

•  Step  4: 

(1)  Successful  step.  If  pk  >  P  set 

ft+l  =  ft  +  Oik 


and  choose  A^+i  such  that 

Afc  <  Afc+i  <  min(7Afc,  A).  (2.3) 

(2)  Unsuccessful  step.  Otherwise,  set 

ft+i  =  ft  and  0  <  Afc+i  <  7Afc. 

•  Step  5:  Increment  k  by  one  and  go  to  Step  1. 

3  Global  convergence 

In  an  abstract  setting  our  problem  may  be  formulated  as 
Minimize  $(ft  =  ||0(ft||  on  a  dosed,  convex  set  C. 
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To  solve  this  problem,  at  each  stage  of  the  iteration  we  solve  the  following  constrained, 
linearized  problem: 

Minimize  \£(/?,  a)  subject  to  (3  + a  £C  and  ||a||  <  A. 

In  order  to  get  the  linearization  in  our  case,  we  solve  the  least  distance  subproblems 
(Pi),  i  =  1, ...  ,1V,  with  a  shape  parameter  /?,  and  use  (2.2),  or  (2.1). 

For  the  purpose  of  characterizing  stationary  points,  we  introduce  the  quantity 

ViOJ)  =  -inf{®(/3,d)  |  |H|  <  1,  0  +  a  e  C}. 

Note  that  Vx  (/?)  >  0,  since  ^(Z?,  0)  =  0. 

By  convexity,  V  i  (/ 3 )  =  0  implies  that  a  =  0  is  a  solution  of  the  linearized  minimization 
problem.  Madsen  and  Schj^er- Jacobsen  [6]  have  shown  that  the  latter  condition  is 
equivalent  to  a  condition  given  therein  for  the  functional  to  have  a  stationary  point.  In 
order  to  prove  Theorem  3.3  we  prove  a  lemma  that  was  given  in  a  similar  form  for  the 
Zoo  case  in  Madsen  and  Schjter- Jacobsen  [6]  and  Jonasson  and  Madsen  [5]).  We 
give  a  different  proof  that  is  applicable  to  both  the  Zi  and  cases. 

Lemma  3.1  Let  Vi(/?)  >  e  and  A  <  A.  For  the  solution  of  the  linearized  problem  the 
estimate 

V(f3,a)<-Ce  A  (3.1) 

holds,  with  a  constant  that  depends  only  on  e  and  A. 

Proof:  According  to  the  definition  of  Vi(/?)  and  the  continuity  of  #  there  exists  a 
feasible  au  with  ||ax||  <  1  such  that 

#(/?,ai)  =  -e. 

Let  a  =  ta i,  where  t  =  min(l,  A).  Since  a)  is  a  convex  function,  we  get 

*{P,a)  <  (1  -  *)*(A  0)  +  tVifrax)  =  ~te. 

Since  _ 

t  >  Amin(l,  1/A) 

we  get  the  conclusion  with  C  =  min(l,  1/A).  □ 

Proposition  3.2  For  a  minimum  point , 

Vx(/3)  =  0 

holds. 

Proof:  Assume  the  contrary,  then  Vi (/?)  =  e  >  0  holds.  According  to  the  definition  of 
we  have 

$(/3  +  a)  =  $((3)  +  ^(Z?,  a)  +  o(a). 

By  Lemma  3.1,  we  can  find  an  a  with  ||a||  <  A  such  that  (3.1)  holds.  As  in  the  proof  of 
the  Lemma,  we  may  conclude  that 

$(j3  +  ta)  <  $(/3)  —  CetA  -\-  o(ta) 

for  0  <  t  <  1.  If  we  let  t  0  we  get  a  contradiction  to  the  minimum  property.  □ 
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Theorem  3.3  Either  the  algorithm  ends  in  a  finite  number  of  steps,  or  a  sequence  fib 
is  generated  for  which  lim  inf/^co  Vi (0k)  =  0. 

Proof:  Assume  the  contrary.  Then  there  exists  e  >  0  such  that  Vj  (ft)  >  c  holds  for 
all  k.  By  the  definition  of  pk  and  the  lemma,  it  follows  that  for  a  successful  step 

!.  <Kft+i)  <  <Rft)  ~  vCeAk 

and  by  the  updating  rule  for  Afc+i  we  get 

Afc+i  <  c((j){Pk+\)  ~  ${0k)), 

with  c  —  l/(pCe).  Combining  this  inequality  with  the  updating  rule  for  an  unsuccessful 
step  yields 

A*+l  <  O'A/r  +  c(<£(ft+i)  ~  </>(ft))* 

By  summation  and  the  monotonicity  of  </>(ft)  it  follows  that  for  all  N 

k=0  1  1 

Since  this  implies  the  convergence  of  £)A*.,  we  get  lim  A*  =  0.  From  ||ft||  <  A*.  we 
obtain  the  convergence  of  ft.  From  the  definition  of  pk  it  then  follows  that  limp*,  =  1. 
But  then  the  updating  rule  (2.3)  implies  that  eventually  A*+i  >  A*,  which  gives  a 
contradiction.  □ 

Theorem  3.4  (Global  Convergence,  cf.  More  [7],  Powell  [8])  Assume  that  Vi  (ft 
is  uniformly  continuous.  Then  either  the  algorithm  ends  in  a  finite  number  of  steps,  or 
a  sequence  ft  is  generated  for  which 

lim  Vi(0k)  -  0. 

k—*  oo 

Proof:  Assume  the  contrary.  Then  there  exists  an  such  that  for  each  ho  there  exists 
a  k  >  ho  with 

V 1  (/5a  )  >  €i- 

By  Theorem  3.3  we  can  find  an  index  l  >  k  such  that 

Vi(A)<€i/2 

(ho  will  be  determined  later).  We  choose  the  smallest  such  /.  As  in  the  proof  of  Theorem 
3.3,  it  follows  that  for  that  a  successful  step  with  k  <i  <  l, 

llft+1  -  All  <  A,.  <  2c\{<j){0i)  -  m+i))- 
Clearly,  this  also  holds  for  an  unsuccessful  step.  This  yields 

\\0i  -  ftll  <  2ci(^(ft)  -  <j>{Pi)). 

Since  </>(ft)  converges  by  monotonicity,  we  can  make  ||  fii  —  ft||  arbitrarily  small  for  large 
enough  ko  .  By  the  uniform  continuity  of  Vi  (ft  we  infer 

□ 


which  is  a  contradiction. 


Fitting  of  geometric  elements 

4  A  numerical  example 

As  an  illustrative  example,  we  fit  an  ellipse  to  data,  given  as  coordinate  pairs  in  R2.  There 
are  24  data  points  and  five  components  to  the  shape  parameter  vector  (i.e.,  n  =  2,  d  = 
2,m  =  5,  N  =  24).  We  used  a  standard  parameterization  involving  a  center  (£o,2/o)>  the 
axes  (a,  Z>),  and  a  rotation  angle  9.  . 

The  output  is  shown  below.  The  initial  values  for  the  parameters  and  the  obtained 
parameters  in  three  different  norms  are  given  in  Table  1.  In  the  l2  case,  we  give  as  the 
error  the  root  mean  square  error,  in  the  h  case  the  mean  absolute  deviation ,  and  in  the 


Fig.  1.  l2,  l\ ,  and  Zoo-Approximation. 


X0 

Xi 

a 

b 

0  (degrees) 

Error 

Initial 

values 

0.4989881 

-1.4262126 

4.6719913 

0.4364267 

20.75913 

h 

0.6637511 

-1.3987826 

5.5124671 

0.3376480 

20.90124 

0.11520 

h 

0.5368646 

-1.4465520 

5.2778061 

0.3358224 

20.88869 

0.09047 

^oo 

0.7694412 

-1.3829474 

4.9731226 

0.4491259 

20.66893 

0.23489 

Tab.  1.  Parameters  for  different  norms. 


H.-P.  Helfrich  and  D.  S.  Zwick 

The  number  of  iterations  in  each  case  was  five  or  six.  We  note  that  the  deviations  for 
the  best  fit  h  and  ellipses  exhibit  behavior  typical  to  these  norms:  five  of  the  data 
points  lie  on  the  best  fit  li  ellipse  and  there  are  six  deviations  of  largest  magnitude  in 
the  loo  case. 
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Abstract 

In  this  paper,  a  general  technique  for  evaluation  of  measurements  by  the  method  of 
Least  Squares  is  presented.  The  input  to  the  method  consist  of  estimates  and  associated 
uncertainties  of  the  values  of  measured  quantities  together  with  specified  constraints 
between  the  measured  quantities  and  any  additional  quantities  for  which  no  information 
about  their  values  are  known  a  priori.  The  output  of  the  method  consist  of  estimates  of 
both  groups  of  quantities  that  satisfy  the  imposed  constraints  and  the  uncertainties  of 
these  estimates.  Techniques  for  testing  the  consistency  between  the  estimates  obtained  by 
measurement  and  the  imposed  constraints  are  presented.  It  is  shown  that  linear  regression 
is  just  a  special  case  of  the  method.  It  is  also  demonstrated  that  the  procedure  for 
evaluation  of  measurement  uncertainty  that  is  currently  agreed  within  the  metrology 
community  can  be  considered  as  another  special  case  in  which  no  redundant  information 
is  available.  The  practical  applicability  of  the  method  is  demonstrated  by  two  examples. 

1  Introduction 

In  1787,  the  French  mathematician  and  physicist  Laplace  (1749-1827)  used  the  method 
of  Least  Squares  to  estimate  8  unknown  orbital  parameters  from  75  discrepant  observa¬ 
tions  of  the  position  of  Jupiter  and  Saturn  taken  over  the  period  1582-1745.  Since  then, 
the  method  of  Least  Squares  has  been  used  extensively  in  data  analysis.  Like  Laplace, 
most  people  use  a  special  case  of  the  method,  known  as  unweighted  linear  regression.  The 
calculation  of  the  average  and  the  standard  deviation  of  a  repeated  set  of  observations 
is  the  most  simple  example  of  that.  The  unweighted  regression  analysis  is  based  on  the 
assumptions  that  the  observations  are  independent  and  have  the  same  (unknown)  vari¬ 
ance.  In  addition,  the  linear  regression  is  based  on  the  assumption  that  the  observations 
can  be  modelled  by  a  function  that  is  linear  in  the  unknown  quantities  to  be  determined 
by  the  regression  analysis.  For  most  measurements  carried  out  in  practice,  none  of  these 
assumptions  can  be  justified.  In  order  to  evaluate  the  result  of  a  general  measurement, 
in  which  some  redundant  information  has  been  obtained,  one  therefore  has  to  Apply  the 
method  of  Least  Squares  in  its  general  form. 

This  paper  describes  how  measurements  can  be  evaluated  by  the  method  of  Least 
Squares  in  general.  The  paper  is  based  on  an  earlier  work  of  the  author  [2]  but  includes 
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several  new  features  not  published  before  as  well  as  practical  examples  from  the  daily 
work  at  DFM.  An  alternative  approach  is  described  in  [6]. 

2  Measurement  model 

In  a  general  measurement,  a  number  m  >  0  of  quantities  is  either  measured  directly  using 
measuring  instruments  or  known  a  priori,  for  example  from  tables  of  physical  constants 
etc.  The  (exact)  values  of  these  m  quantities  are  denoted  £ 

=  mf- 

Due  to  measurement  uncertainty,  the  values  z  obtained  by  the  measurement  (or  from 
tables  etc.) 

Z  =  (#1)  •••)  zm) 

are  only  estimates  of  the  values  £.  The  standard  uncertainties  of  the  estimates  Zi, 

Ufa)  ,  i  =  l,...,m, 

are  determined  in  accordance  with  the  GUM  [1]  and  depend  on  the  accuracy  of  the 
instruments  and  the  reliability  of  any  tabulated  value  used.  In  general,  some  of  the 
estimates  Zi  may  be  correlated.  If  r(zi,Zj)  is  the  correlation  coefficient  between  the 
estimates  Zi  and  Zj  then  the  covariance  life,  Zj )  between  these  two  estimates  is  given  by 

u(zi,  zj)  =  u(zi)r(zi ,  Zj)u(zj). 

Because  of  the  uncertainty,  the  estimates  z  can  be  considered  as  an  outcome  of  a  m- 
dimensional  random  variable  Z  with  expectation  C  (the  exact  values  of  the  quantities) 
and  covariance  matrix  £ 

/  u2(zi)  u{z1,z2)  ■■■  u(zi,zm) 

'  u(z2,Zi )  U2(z2)  •••  u(z2,zm) 

£  =  w(z,z”)  =  .  . 

\  u{zm,Z!)  u{zm,z2)  •••  U2{zm) 

In  addition  to  the  m  quantities  for  which  prior  information  is  available  either  from 
direct  measurement  or  from  other  sources,  a  general  measurement  may  involve  a  number 
k  >  0  of  quantities  for  which  no  prior  information  is  available.  The  values  of  these 
quantities  are  denoted  by 

v3  =  (/3i  ,...,Pk)T. 

In  general,  the  values  /3  and  £  are  constrained  by  a  number  n  of  physical  or  empirical 
laws.  These  constraints  may  be  written  in  terms  of  an  n-dimensional  function 


It  is  assumed  that  fi:Q—>R  ,  i  =  1 . .  .n,  are  differentiable  functions  (with  con- 
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tinuous  derivatives)  defined  in  a  region  Q  C  Rk+m  around  (/3,  £).  As  indicated  in  (2.1), 
the  number  n  has  to  be  larger  than  or  equal  to  the  number  k  of  quantities  for  which  no 
prior  information  is  available;  otherwise  some  of  the  values  (3  cannot  be  determined.  In 
addition,  the  number  n  of  constraints  has  to  be  smaller  than  the  total  number  Hmof 
quantities  involved;  otherwise  the  values  of  (3  and  C  would  be  uniquely  determined  by 
the  constraints  and  no  measurements  would  be  needed. 

The  estimates  z,  the  covariance  matrix  £  and  the  n-dimensional  function  f(/3,  C)  are 
the  input  to  the  general  Least  Squares  method.  It  should  be  stressed  that  no  probability 
distribution  has  to  be  assigned  to  the  input  estimates  z.  On  the  contrary,  if  a  probability 
distribution  has  been  assigned  to  an  estimate,  it  should  be  used  to  calculate  the  mean 
value  and  the  variance  of  the  estimate  which  should  then  serve  as  input  to  the  Least 
Squares  method. 

Like  any  other  covariance  matrix,  the  covariance  matrix  u(z,  zT)  =  £  is  positive 
semi-definite.  Otherwise,  at  least  one  linear  combination  xTz  of  the  estimates  z  would 
have  negative  variance  w(xTz,  zTx)  =  xT£x.  In  the  following  it  is  assumed  that  £  is 
positive  definite  and  therefore  non-singular. 

3  Normal  equations 

Least  Squares  estimates  f3  and  £  of  the  values  /3  and  £  are  found  by  minimizing  the 
chi-square  function 

X2(C;z)  =  (z-OrS-1(z-C) 

subject  to  the  constraints 

.f  (ft  0  =  0. 

It  is  convenient  to  solve  this  minimization  problem  by  using  Lagrange  multipliers  [5]: 
If  a  solution  (/3,  £)  to  the  minimization  problem  exists,  the  solution  satisfies  the  equation 

V  C,  A;  z)  =  0 

where 

$(/3,  C,  A;  z)  =  (z  -  <)TS-1(z  -  0  +  2ATf(/3, 0 
for  a  particular  set  of  Lagrange  multipliers  A  =  (Ai, . . . ,  An)T.  By  taking  the  gradient  of 

(3.1) 


The  equations  (3.1)  are  called  the  normal  equations  of  the  Least  Squares  problem. 


the  function  $,  the  following  n  -b  m  +  k  equations  in  (/3,  A)  evolve: 

V^f(/3,C)TA  =  0, 
-S-^z-^  +  V^ACfA  =  0, 

f(ft<)  =  0, 

where 


Hi.  .. 
801 

..  \ 

a/?fc  1 

/  *1l 
aci 

Ml  \ 
9Cm 

v„f  = 

.  .. 

80 1 

.  din  . 

dk  ) 

and  V^f  == 

<i|Qj 

..  Hn.  ; 
OCm  / 
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4  Solving  the  normal  equations 

If  is  an  approximate  solution  to  the  normal  equations,  a  refined  solution 

(/3|+1,  C/+i?  A/+i)  can  be  found  by  the  iteration 

ft+i  \  /  A  Y  /  aa  \ 

Ci+i  =(  Ci  1+  AC/  I  ,  1  =  1,...,  oo. 

Ai+i  /  V  0  /  V  A/+i  / 

The  step  (AA,  AC/,  A/+i)  is  given  by 

/  AA  \  /  0(W)  \ 

D(A,C/)  AC,  =  E-x(«-C«)  .  (4-1) 

v  A,+1  )  \  -f(A,C/)  / 

where 

/  V/jf(A,Ci)T  V 

D(A,C/)=  S"1  Vcf(A,C/)T  (4.2) 

V  v„f(A,c. i)  vcf(A,C/)  / 

is  a  symmetric  matrix.  This  iteration  procedure  is  similar  to  Newton  iteration  except 
that  the  second  order  partial  derivatives  of  the  functions  fi  have  been  neglected  as  it  is 
practice  to  do  in  non-linear  Least  Squares  estimation  [4], 

In  order  to  reduce  the  effects  of  numerical  rounding  errors,  it  is  recommended  to  calcu¬ 
late  the  step  (A/3j,  Aj+i)  by  solving  the  linear  equations  (4.1)  by  Gauss- Jordan  elim¬ 
ination  with  full  pivoting  [4].  This  algorithm  also  provides  the  inverse  matrix  D(/3j,  Cz)-1 
which  is  needed  at  the  final  stage  for  estimating  the  covariance  matrix  of  the  solution  as 
shown  in  Section  5. 

If  proper  starting  values  (f31,  Ci)  are  selected,  the  iteration  is  expected  to  converge 
towards  the  solution  (/3,  £) 


Since  the  solutions  £  are  expected  to  be  close  to  the  estimates  z  of  £  available  a 
priori,  the  estimates  z  are  obviously  the  proper  starting  values  Ci  to  be  selected  for 
the  iteration.  The  selection  of  proper  starting  values  /31  is  more  difficult  in  general.  If, 
however,  f(/3,  £)  are  linear  functions  in  the  variables  /3,  the  iteration  process  will  converge 
after  a  few  iterations,  independent  of  the  choice  of  /3l . 

Most  differentiable  functions  f(/3,  £)  can  be  handled  by  the  described  method.  In 
order  to  get  reliable  standard  uncertainties,  it  is  required,  however,  that  the  function 
can  be  approximated  by  a  first  order  Taylor  expansion,  i.e. 

f(A  C)  =  f(A  C)  +  V/jf  (4,  CX/3  -  $)  +  Vcf(A  C)(C  -  C) 

when  the  values  /3  and  £  are  varied  around  the  solution  /3  and  £  on  a  scale  comparable  to 
the  standard  uncertainties  of  the  solution.  If  this  vaguely  formulated  criterion  is  met,  the 
function  f(/3,  £)  is  said  to  be  linearizable.  Note  that  almost  any  differentiable  function 
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will  be  linearizable  if  the  standard  uncertainties  are  sufficiently  small.  On  the  other 
hand,  if  the  uncertainties  are  sufficiently  high,  all  non-linear  functions  will  no  longer  be 
linearizable.  The  requirement  that  f(/3,  £)  is  linearizable  is  considered  to  be  the  only 
major  limitation  of  the  method  of  Least  Squares! 

It  should  be  mentioned  that  the  minimization  using  Lagrange  multipliers  will  fail  in 
case  the  gradients  V/g/i  and  of  one  of  the  constraint  functions  fo  are  both  equal  to 
zero  at  the  point  of  the  solution  (/3,  £).  This  gives  some  restrictions  on  how  a  constraint 
may  be  formulated.  A  function  fa  defining  a  constraint  may,  for  example,  be  replaced  by 
the  square  of  that  function,  ff.  But  since  fi((3 ,  0=0)  the  gradient  of  f?  will  be  zero  at 
the  point  of  the  solution  (/3,  £)  although  the  gradient  of  /,  is  not. 

5  Properties  of  the  solution 

Since  the  solution  (/3,£,  A)  depends  on  the  estimates  z,  which  are  considered  as  the 
realization  of  the  multivariate  random  variable  Z,  the  solution  (/3,£,A)  can  also  be 
regarded  as  a  multivariate  random  variable.  If  the  functions  /,  (/3,  £)  are  linearizable,  the 
estimates  (/3,  £)  are  linear  in  Z 

(P\  (  \  (  0<M)  \ 

1  A  /  V  0  )  +D(j9,Crl-(  j  '  (51) 

In  that  case,  the  expectation  of  the  solution  is 


which  means  that  (/3,C)  are  central  estimators  of  the  values  (/3,  £).  Under  the  same 
assumption,  the  covariances  of  the  solution  are  given  by  the  symmetric  matrix  D(/3,  f)-*1 
provided  by  the  Gauss-Jordan  elimination  algorithm2 

f  u(0,$T)  «(A<f)  0(M)  \ 

u{  C,f)  n(UT)  0(m,n)  =D(/3,C)_1=D(/3,C)_1.  (5.2) 

V  0(n'k)  ()(n’m’  —u(X,XT) 

This  relation  can  be  derived  as  follows.  Partition  the  symmetric  matrix  D“'  into  nine 
sub-matrices  similar  to  the  left  hand  side  of  (5.2)  or  similar  to  the  partitioning  of  D 
according  to  the  definition  (4.2).  Express  the  covariance  matrix  of  the  solution  (5.1)  in 
terms  of  the  covariance  matrix  S  of  the  random  variable  Z  and  the  matrix  D"1.  Insert 
the  partitioned  D~x  into  the  resulting  matrix  double  product  and  express  the  covariances 
of  the  solution  in  terms  of  S-1  and  the  sub-matrices  of  D~1.  Reduce  these  expressions 
to  the  final  result  by  multiple  use  of  the  nine  relations  between  the  sub-matrices  of  D_1 
and  D  derived  from  the  identity  DD-1  =  I. 

2The  empty  brackets  in  the  left  hand  matrix  indicates  the  parts  of  D“x  that  do  not  contain  information  about 
covariances. 
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Prom  equation  (5.1)  and  (5.2)  the  covariance  matrices  between  (/3,  £)  and  the  estim¬ 
ates  z  are  found  to  be 

u{z,f3T)  =  u(C$T) 

w(z>CT)  =  «(C,CT)- 

Prom  the  last  of  these  two  relations,  a  relation  of  particular  interest  is  derived, 

u(z  -  c  zT  -  ?)  =  u(z,  zT)  -  u(C,  CT). 

For  the  diagonal  elements,  this  relation  reads 

u20i  -  0)  =  u2(zi)  -  u2(Ci)  ,  i  —  l,. . .  ,m. 

That  is,  the  variance  of  the  difference  between  the  initial  estimate  Zi  of  Q  and  the  refined 
estimate  Q  is  equal  to  the  variance  of  z\  minus  the  variance  of  Ci-  This  relation  is  useful 
when  testing  if  the  difference  Z{  —  Q  is  significantly  different  from  its  zero  expectation. 

6  x2  test  for  consistency 

When  the  estimates  (/3,  C)  have  been  found,  the  minimum  %2.  value 

x2(C;z)  =  (z  -  C)Ts_1(z  -  C) 

can  be  used  to  test  if  the  measured  values  z  are  consistent  with  the  measurement  model 
(2.1)  within  the  uncertainties  defined  by  the  covariance  matrix  X.  If  the  model  is  lin- 
earizable,  the  expectation  of  the  random  variable  x2(C;  Z)  is  equal  to  the  number  m  of 
measured  quantities,  minus  the  number  m  +  k  of  adjusted  quantities,  plus  the  number 
n  of  constraints,  that  is 

E  x2(C;  Z)j  =  m  —  (m  +  k)  +  n  =  n  —  h  —  v. 

If,  in  addition,  the  random  variables  Z  are  assumed  to  follow  a  multivariate  normal 
distribution  with  mean  values  £  and  covariance  matrix  X,  the  random  variable  x2(C;  Z) 
will  follow  a  x2M  distribution  with  v  —  n  —  k  degrees  of  freedom.  In  that  case,  the 
probability  p  of  finding  a  x2  value  larger  than  the  value  X2(C;z)  actually  observed  can 
be  calculated  from  the  x2(u)  distribution 

v  =  ^(x2^)  >  x2(C,z)|  =  i  -  P{x2M  <  x2(C,z)}. 

If  this  probability  p  is  smaller  than  a  certain  value  a,  the  hypothesis  that  the  meas¬ 
ured  values  are  consistent  with  the  measurement  model  has  to  be  rejected  at  a  level  of 
significance  equal  to  a.  As  the  result  of  measurements  are  normally  quoted  at  a  95%  level 
of  confidence,  an  a  =  5%  level  of  significance  is  a  reasonable  choice  for  the  consistency 
test. 

Although  the  assumption  of  a  normal  distribution  of  Z  may  not  be  fulfilled,  it  is 
suggested  to  carry  out  the  test  of  consistency  as  described  above  anyway.  This  is  justified 
by  the  fact  that  a  value  of  x2(C; z)  significantly  higher  than  the  expectation  v  indicates 
inconsistency  no  matter  what  the  distribution  of  Z  might  be.  The  calculated  probability 
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p  simply  describes  how  unlikely  the  observed  x2  value  is  if  a  normal  distribution  is 
assigned  to  Z. 

7  Normalized  deviations 

If  the  test  described  in  the  previous  section  leads  to  a  rejection  of  the  measurements, 
a  tool  for  identifying  the  outlying  measurements  is  desirable.  A  measured  value  Z{  is 
defined  as  an  outlier  if  the  difference  Zi  -  Q  is  significantly  different  from  zero  taking 
into  account  the  standard  uncertainty  u(zj  —  (i)  of  that  difference.  This  leads  to  the 
introduction  of  the  normalized  deviation  di  defined  by3 

rf,-  =  =  =  ,  i  = 

Ju>(Zi)-U  2(Ci) 

The  normalized  deviation  d{  has  zero  expectation  and  variance  1.  A  normalized  de¬ 
viation  with  \ck\  larger  than  2  or  3  is  therefore  rather  unlikely  no  matter  what  the 
distribution  of  the  random  variable  di  might  be. 

If  a  multivariate  normal  distribution  is  assigned  to  Z  and  the  model  function  f(/3,  £) 
is  linearizable,  the  normalized  deviation  d{  is  normally  distributed, 

di  e  N( 0,1)  ,  t  = 

In  that  case 

P{\di\  >2}  =  5%, 

and  a  measurement  with  |d;|  >  2  is  therefore  identified  as  an  outlier  at  a  5%  level  of 
significance.  It  is  suggested  to  use  the  criteria  |d*j  >  2  to  identify  potential  outliers  even 
if  the  distribution  assigned  to  Z  is  not  normal. 

8  Adjustment  of  a  variance  a2 

If  some  values  Zi  have  a  common  but  unknown  variance  u2(zi)  —  a2,  this  variance  can 
be  estimated  by  adjusting  a1  by  an  iterative  procedure  until  the  “observed”  x2  value 
becomes  equal  to  its  expectation  value  v 

x2(C;z)  =  (z-C)TS“1(z-c)  =  i,, 

where  the  covariance  matrix  £  is  a  function  of  the  unknown  variance  a2 .  As  the  estimates 
£  depends  on  the  value  assigned  to  <r2,  these  estimates  have  to  be  updated  together  with 
the  estimates  C  each  time  the  value  of  a2  is  changed  during  the  iteration. 

This  way  of  estimating  the  unknown  variance  cr2  leads  to  the  well-known  expression 
for  the  standard  deviation  in  the  case  of  a  repeated  measurement  of  a  single  quantity  as 
shown  m  Section  13. 

.  3If  u(zi  -  £i)  =  0,  the  difference  Z{  —  Q  will  be  zero  as  well  and  di  may  be  set  equal  to  zero.  This  situation  occurs 
whenever  there  is  no  'redundant  information  available  regarding  the  value  of  the  quantity  Q. 
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9  Example:  Calibration  of  an  analytical  balance 

An  analytical  balance  with  capacity  Max= 220  g,  resolution  d— 0.1  mg,  and  built-in  ad¬ 
justment  weight  was  calibrated  by  DFM  in  October  1999  during  an  inter-laboratory 
measurement  comparison  piloted  by  DFM.  Two  mass  standards  were  used  as  reference 
standards.  One  of  them  was  a  traditional  200  g  weight  (named  R200g)  of  known  con¬ 
ventional  mass  value4  trr  and  density  pR.  The  other  reference  standard  was  a  specially 
designed  200  g  stack  of  weights  consisting  of  four  discs  (named  lOOg,  50g,  25g  and  25g*) 
machined  from  the  same  metal  bar  of  known  density  p.  The  conventional  mass  val¬ 
ues  mi,  m2,  m3,  m 4  respectively  of  these  four  discs  were  not  known  a  priori;  only  the 
conventional  mass  value  ms  =  mi  +  m2  +  m3  +  of  the  stack  was  known. 

The  calibration  was  performed  by  placing  a  weight  combination  at  the  weighing  pan 
of  the  balance  and  by  recording  the  corresponding  average  indication  I  in  the  display. 

A  total  of  18  weight  combinations  were  used.  Each  weight  combination  was  weighed  3 
times  from  which  the  average  indication  was  calculated.  The  calibration  was  repeated  4 
times  during  a  period  of  10  days  in  which  the  inter-laboratory  comparison  took  place. 

From  these  four  calibrations,  a  grand  average  indication  /*,  i  =  1, . . . ,  18  was  calculated 
for  each  of  the  18  weight  combinations  specified  in  Table  1.  The  standard  uncertainty  of 
the  grand  average  was  estimated  from  the  observed  variation  in  indication  over  the  four 
calibrations. 
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Tab.  1.  The  weight  combinations  corresponding  to  the  18  balance  indications  /*. 

Due  to  the  effect  of  air  buoyancy,  the  balance  indication  depends  not  only  on  the 
mass  of  the  weighed  body,  but  also  on  the  density  of  the  body  as  well  as  the  density  of 
the  air.  When  calibrated  in  air  of  known  density  a,  the  reference  indication  Ir  of  the 
balance  corresponding  to  a  load  generated  by  a  weight  with  conventional  mass  value  m 
and  density  p  is  given  by 

where  ao=1.2  kg/m3  and  po=8000  kg/m3  are  the  reference  densities  of  the  air  and  the 
weight  respectively  to  which  the  conventional  value  of  mass  refers.  As  a  model  for  the 

4The  conventional  mass  value  of  a  body  is  defined  as  the  mass  of  a  hypothetical  weight  of  density  8000  kg/m3 
that  balances  the  body  when  weighed  in  air  of  density  1.2  kg/m3  and  temperature  20  °C. 


178 


Lars  Nielsen 


ms 

w 

rriR 

[g] 

PR 

[kg/m3] 

[kg/m3] 

[kg/m3] 

h 

[div] 

z 

199.988816 

199.999043 

7833.01 

7965.76 

1.1950 

199.988617 

u(z) 

0.000010 

0.000008 

0.29 

0.71 

0.0035 

0.000023 

c 

199.988814 

199.999043 

7833.01 

7965.76 

1.1946 

199.988620 

“(C) 

0.000010 

0.000008 

0.29 

0.71 

0.0035 

0.000011 

d 

1.66 

-1.66 

1.66 

-1.66 

1.66 

-0.16 

h 

h 

h 

h 

h 

h 

[div] 

[div] 

[div] 

[div] 

[div] 

[div] 

z 

199.988608 

174.992133 

175.009992 

150.013558 

149.980675 

125.002083 

u(z) 

0.000023 

0.000023 

0.000023 

0.000023 

0.000023 

0.000023 

c 

199.988620 

174.992149 

175.010024 

150.013558 

149.980672 

125.002087 

“(C) 

0.000011 

0.000012 

0.000012 

0.000013 

0.000012 

0.000014 

d 

-0.56 

-0.77 

-1.61 

0.03 

0.14 

-0.20 

h 

h 

ho 

In 

h2 

/l3 

[div] 

[div] 

[div] 

[div] 

[div] 

[div] 

z 

124.984217 

100.005650 

99.982925 

74.986433 

75.004325 

50.007892 

“(*) 

0.000023 

0.000023 

0.000023 

0.000023 

0.000023 

0.000023 

c 

124.984212 

100.005632 

99.982899 

74.986450 

75.004325 

50.007881 

“(C) 

0.000014 

0.000013 

0.000013 

0.000014 

0.000014 

0.000012 

d 

0.25 

0.93 

1.38 

-0.87 

0.03 

0.54 

7l4 

/is 

he 

Il7 

/is 

[div] 

[div] 

[div] 

[div] 

[div] 

z 

49.974992 

24.978533 

24.996417 

199.998867 

199.998875 

“(*) 

0.000023 

0.000023 

0.000023 

0.000023 

0.000023 

c 

49.974995 

,  24.978557 

24.996432 

199.998851 

199.998851 

“(C) 

0.000013 

0.000011 

0.000011 

0.000011 

0.000011 

d 

-0.19 

-1.17 

-0.77 

0.78 

1.19 

/ 

A 

mi 

m2 

777.3 

7774 

[g/div] 

[l/div] 

[g] 

[g] 

[g] 

[g] 

0 

1.00000186 

-4.4E-09 

100.005774 

50.007963 

24.978601 

24.996476 

u{0) 

0.00000019 

1.0E-09 

0.000011 

0.000010 

0.000010 

0.000010 

Tab.  2.  Measured  and  estimated  values  and  associated  standard  uncertainties. 


calibration  curve  of  the  balance,  a  second  order  polynomial  through  zero  is  assumed 

IR  =  f(l  +  AI2) 

where  /  and  A  are  unknown  quantities  to  be  determined  from  the  calibration  data. 

In  this  example,  there  are  m  =  23  quantities  for  which  prior  information  is  available 
from  the  measurements  performed: 

C  “  Ji, . . . ,  I\s)T 

whereas  there  are  k  =  6  quantities  for  which  no  prior  information  is  available: 

f3  =  (f,A,mi,m2,m3,m4)T. 
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Tab.  3.  Correlation  coefficients  of  the  estimated  $  values. 


Between  these  quantities,  there  are  n  —  19  constraints: 

(mi  +  m2  +  ra3  +  m4)  (l  -  {a  -  a0)  (jj  - 

mR  (l  -  (a  -  a0)  _  ^))  -/  (j18 

ms  -  (mi  +  m3  +  m3  +  m4) 

The  measured  values  z  and  associated  standard  uncertainties  are  given  in  Table  2 
under  the  row  headings  z  and  u(z ).  All  measured  values  are  assumed  to  be  uncorrelated. 

By  solving  the  normal  equations,  the  estimates  £  and  j3  and  associated  standard 
uncertainties  given  in  Table  2  under  the  row  headings  £,  u((),  f3  and  u(j3)  are  obtained. 
Selected  correlation  coefficients  derived  from  D(/3,  C)-1  are  given  in  Table  3.  The  ob¬ 
served  minimum  x2  value  is  x(C>  z)  =  8.6  which  should  be  compared  to  the  expectation 
value  u  —  n  -  k  =  19  -  6  =  13.  Since  P  (x2(13)  >  8.6}  =  80.3%,  it  is  concluded  that 
the  measured  values  are  consistent  with  the  specified  constraints  taking  into  account  the 
measurement  uncertainties.  This  conclusion  is  confirmed  by  the  calculated  normalized 
deviations  given  in  Table  2  under  the  row  heading  d\  all  normalized  deviations  satisfy 
the  criterion  \d\  <  2; 

From  the  estimates  of  the  quantities  /  and  A  and  the  associated  covariance  matrix, 
the  error  of  indication  E ,  defined  as 

e  =  i-ir  =  i-j(i  +  ai2), 

and  the  associated  standard  uncertainty  u(E)  can  be  calculated  as  a  function  of  the 
indication  I.  The  result  is  shown  in  Figure  1  as  the  full  lines  representing  E  —  u(E ),  E, 
and  E  +  u(E).  The  measured  points  E^i  =  1, . . . ,  18  shown  in  the  figure  are  the  observed 
average  balance  indications  U  minus  the  corresponding  reference  values  Ir.  The  error 
bars  of  the  measured  points  indicate  the  standard  uncertainties  u(Ei)  that  have  been 
calculated  taking  into  account  the  covariance  between  Ii  and  Ir. 

10  Example:  Evaluation  of  calibration  history 

A  weight  (named  Rlmg)  of  nominal  mass  1  mg  has  been  calibrated  39  times  in  the 
period  1992-2001.  For  calibration  number  z,  the  mass  of  the  weight  at  the  time 

ti  and  the  associated  standard  uncertainties  u(nii)  and  u(U)  are  given.  The  calibration 
history  of  the  weight  is  shown  in  Figure  2  as  dots  with  error  bars  indicating  the  standard 
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Fig.  1.  Error  of  indication  of  the  calibrated  balance. 


uncertainties;  the  scale  mark  1992-01  on  the  time  axis  indicates  the  position  of  the  date 
1  January  1992  etc. 

Due  to  wear  and  changes  in  the  amount  of  dirt  adsorbed  to  the  surface,  the  mass  of 
the  weight  is  expected  to  change  in  time.  A  reasonable  model  of  the  change  in  mass  as  a 
function  of  time  is  a  superposition  of  a  deterministic  linear  drift  and  a  random  variation 

rrii  =  ai  +  a2ti  +  6rrii  ,  i  =  l,...,39, 

where  (5m*  is  a  random  variable  with  zero  expectation  and  variance  cr2.  The  drift  para¬ 
meters  a\ ,  a2  and  the  associated  covariance  matrix  as  well  as  the  variance  a2  are  unknown 
a  priori  and  are  to  be  estimated  from  the  calibration  history  available.  Once  the  estim¬ 
ates  &i  and  a,2  have  been  found,  it  is  possible  to  predict  a  value  m  of  the  mass  of  the 
weight  as  a  function  of  time  t 

m  =  di  +  a2t  +  5m, 

where  8m  =  0  with  standard  uncertainty  u(<5m)  =  a.  The  standard  uncertainty  of  the 
predicted  mass  value  is  given  by 

u2(m)  =  tt2(ai)  +  t2u2(a2)  +  2fu(ai,&2)  -her2. 

The  measurement  model  used  for  evaluating  the  calibration  history  is 

C  =  (mi,...,m39,ti,...,t39,<5mi,...,<fm39)T  ,  /3  =  (aua2)T, 
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Fig.  2.  Evaluation  of  the  calibration  history  of  a  1  mg  weight  assuming  that  a  =  0. 

]R1  mg  (After  adjustment  of  a) 


1.0695 j  v  =  37, *1  =  37.0, <7  =  0.092 Hg 
1.0693  ?{*2{37)>37.0}  =  46.7% 
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Fig.  3.  Evaluation  of  the  calibration  history  of  the  1  mg  weight  with  a  adjusted 
to  0.092  fig. 


mo  = 


mi  -  (ai  +  a2ii  +  Smi) 


\  m.39  —  (ai  +  <12^39  +  <^39)  / 

The  measured  values  z  are  given  by  the  calibration  history,  except  for  the  values  of 
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Srrii,i  =  1, . . . ,  39  which  are  set  equal  to  the  expectation  value  zero.  The  associated  cov¬ 
ariance  matrix  u( z,  zT)  =  £  is  built  up  from  the  uncertainties  w(m,;)  and  u(ti)  available 
from  the  calibration  history  and  a  negligible  but  finite5  initial  value  of  the  unknown 
variance  a2.  Since  the  standard  uncertainties  u(m;)  are  of  the  order  0.1  ^g,  the  value 
a=lE-07  fig  is  considered  negligible  and  is  selected  as  a  starting  point. 

By  solving  the  normal  equations,  estimates  a\  and  a2  of  the  drift  parameters  and  the 
associated  covariance  matrix  are  found  after  a  few  iterations.  The  predicted  value  m  of 
the  mass  of  the  weight  and  the  associated  standard  uncertainty  u(m)  as  a  function  of 
time  are  shown  in  Figure  2  as  solid  lines.  The  normalized  deviations  d  associated  with 
the  mass  values  m*  are  shown  in  Figure  2  as  well6.  The  observed  minimum  chi-square 
value  is  x2  =  66.6  which  is  large  compared  to  the  expectation  value  v  —  39  -  2  =  37. 

Since  P{x2 (37)  >  66.6}  =  0.2%,  the  hypothesis  a  =  0,  or  no  random  variation  in  the 
mass,  is  rejected  at  a  0.2%  level  of  significance. 

The  value  of  a  is  therefore  increased  as  described  in  Section  8  until  the  calculated 
minimum  x2  value  becomes  equal  to  its  expectation  value  v  =  37.  In  this  way  the 
standard  uncertainty  reflecting  the  random  variation  of  the  mass  of  the  weight  is  found 
to  be  <7=0.092  fig.  The  result  of  the  evaluation  of  the  calibration  history  after  adjustment 
of  <7  is  shown  in  Figure  3.  Note  the  significant  increase  in  the  standard  uncertainty  of 
the  predicted  value  of  the  mass  of  the  weight  and  the  decrease  in  the  absolute  value  of 
the  normalized  deviations  d. 

The  calibration  history  can  also  be  evaluated  by  an  iterative  technique  based  on  linear 
regression  [3].  The  results  obtained  are  identical  to  the  results  presented  in  this  section. 

11  Case  I:  Univariate  output  quantity,  V  =  h(Xi, . . . ,  X^) 

In  this  section  it  is  shown  that  the  evaluation  of  measurements  by  the  method  of  least 
squares  is  consistent  with  the  generally  accepted  principles  for  evaluating  measurement 
uncertainty  as  described  in  the  GUM  [1], 

Using  the  nomenclature  of  the  GUM,  a  univariate  output  quantity  Y  is  assumed  to 
be  related  to  N  input  quantities  Xi, . . .  ,Xn  through  a  specified  function  h) 

Y  =  h(X1,...,XN). 

The  values  assigned  to  the  input  and  output  quantities  are  denoted  a?i, . . .  ,a;jv  and 
by  y  respectively.  In  the  nomenclature  of  this  paper,  the  measurement  model  is 

C  =  (X1,...,Xn)t  ,  f3  =  (Y), 

{{13, C)  =  (Y-  h{Xl ,...,XN))  =  0. 

The  measured  values  are 

Z  =  (.T1,...,X.v)r 

5 If  the  variance  a2  is  assumed  to  be  exactly  zero,  the  quantities  Srrii  have  to  be  removed  from  the  model.  Otherwise 
the  covariance  matrix  £  will  be  singular. 

6  The  absolute  value  of  normalized  deviations  of  ti  and  8m.i  is  equal  to  the  absolute  value  of  the  normalized 
deviation  of  m*. 
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with  the  known  covariance  matrix 
£  =  u(  z,  zT)  = 


u2(xi) 


u(xN,Xl)  ■■ 
The  coefficient  matrix  D  of  the  normal  equations  is 

0  Of1’** 

D(/3,  C)  =  I  IT1 


u{xux n)  \ 
u2(xN)  ) 


1 


where 


-Vxh(x)T 

1  — Vxh(x)  0 

_  .  f  dh  dh  \ 

Vxh-[dx1’4"dxN)- 


In  the  present  case,  the  solution  to  the  normal  equations  is  found  after  one  iteration, 
y  =  j3  =  h(x  ,  C  =  (®i  A  =  0. 

The  associated  covariances  are  given  by 


u2{y) 

ow»  \ 

u(tv) 

«(CCT) 

0W1) 

=  n(PX)-1 

0(u) 

q(1  ,N) 

-u2(\)  ) 

Vx/i(x)£Vx/i(x)T  Vx/i(x)  £  1 

£  Vxh(x)T  £  0 

1  0  0 


In  other  words, 
.2 


N  N 


U 


(y)  =  Vx/l(x)  £  Vx/l(x)T  =  ^2^2ciu(xiixj)cj,  a  =  (®<) 

i=lj=l  1 

which  is  identical  to  the  linear  variance  propagation  formula  given  in  the  GUM. 


12  Case  II:  Linear  regression,  Y  =  Xa 

Linear  regression  is  applied  when  there  is  a  linear  relationship  Y  =  Xa  between  some 
observed  quantities  Y  and  some  unknown  quantities  a.  The  design  matrix  X  is  made  up 
of  known  elements  that  may  be  given  as  specified  functions  of  one  or  several  independent 
variables.  In  the  notation  of  this  paper,  the  measurement  model  for  the  linear  regression 
problem  is 

<  =  Y  =  (Fi,...,yn)T  ,  /3  =  a=  K...  ,afc)T, 

f«,/3)=Y-Xa  =  0, 

where  X(n>fc)  is  the  known  design  matrix.  The  measured  values  are 

z  =  y  =  (yi  ,...,j/n)T 
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with  known  covariance  matrix 

X  =  u(z.  zT)  = 


u2(yi) 


\  «(j/n-3/l)  •• 

The  coefficient  matrix  D  of  the  normal  equations  is 


u{yi,yn) 

u2{yn) 


0(fc,fc) 

0(fc,n) 

- xT 

Q(n,fc) 

IT1 

j(n,n) 

-X 

j(n,n)  Q(n,n) 

Again,  the  solution  to  the  normal  equations  is  found  after  one  iteration, 

k  =  $  =  CXTE~1y  ,  Y  =  C  =  Xa  ,  A  =  -E-1(Y-y) 
where  C  =  (XTE_1X)_1.  The  associated  covariances  are  given  by 


w(a,  aT) 

«(a,YT) 

q(M)  \ 

j  =D(AC)-1 
CXT 

u(Y,aT) 

Q(n,fc) 

«(Y,Yr) 

Q(n,n) 

Q(n,n) 

(  c 

-cxte-1 

= 

xc 

V  -E-1XC 

XCXT 

I  -  E_1XCXt 

I  -  xcxte 

E-1XCXtE-1- 

that  is, 

a  =  CXTS_1y  ,  u(  a,  ar)  =  C  =  (XTS“1X)-1 
as  is  known  from  the  theory  of  linear  regression. 


13  Case  III:  Repeated  observations  of  a  single  quantity 

Assume  that  a  quantity  X  is  measured  n  times  with  the  same  uncertainty  a.  Such  a 
measurement  can  be  modelled  by  n  quantities  Xi , . . . ,  Xn  having  a  common  value  /i 

C  =  X  =  (X1,...,Xn)r  ,  /3  =(M), 

f(AC)=  j  =o. 

V  xn-»  J 

The  measured  values  are 

Z  =  X  =  Xn)T, 

and  under  the  assumption  that  the  measurement  results  are  mutually  independent,  the 
associated  covariance  matrix  is  given  by 

fa2  0  \ 

E  =  u(z,zT)  =  :  :  . 
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The  coefficient  matrix  D  of  the  normal  equations  is 

(OC1*1)  O^1’^  -lC1’71)  \ 

0(n,l)  0.-2I(n,n)  I  (n.n)  ] 

_l(M)  j(n,n)  Q  (n,n)  J 

where  1  denotes  a  matrix  with  all  elements  equal  to  1.  The  solution  of  the  normal 
equations  is  found  after  one  iteration, 

n 

p,  =  $=-yXi  ,  X  =  C  =  £l(n,1)  ,  A  =  -cr“2(X-x). 

n  /ii  ^  ... 


The  associated  covariances  are  given  by 


u2(p.) 

u(/t,  X?) 

qC1*71)  ^ 

u(X,  (i) 

u(X,XT) 

()(n,»)  =D(/3,C  r1 

Q(n,n) 

( 

—u(X,  XT)  ) 

a2n~l  <72n-1l(1,n) 

n  1l(1,n) 

=  (  a 

2n-ii(n,l)  ^-ljCn.n) 

l(n»«)  _  n-ll(n>n) 

As  expected, 


n-l^(n,l)  l(n,n)  _  n-l]_(n,n)  l(n,n)  _  j(n,n)) 

A=-E**  »  u2(p,)=a2/n. 

n  f  *  ' 


If  a2  is  not  known  a  priori,  it  can  be  estimated  by  solving  the  equation 


which  is  the  well  known  expression  for  the  experimental  standard  deviation  s. 

14  Conclusion 

A  general  technique  for  evaluation  of  measurements  by  the  method  of  Least  Squares 
has  been  presented.  The  applicability  of  the  method  has  been  demonstrated  by  two 
examples.  It  has  been  shown  that  the  method  is  fully  compatible  with  the  generally 
accepted  principles  for  evaluation  of  measurement  uncertainty  laid  down  in  the  GUM 
and  that  ordinary  linear  regression  is  just  a  special  case  of  the  method. 

The  input  to  the  method  consists  of 

•  An  estimate  of  the  value  of  each  measured  quantity,  including  any  relevant  influence 
quantity. 

•  The  covariance  matrix  of  these  estimates  formed  by  the  standard  uncertainties  of 
the  estimates  and  the  correlation  coefficients  between  the  estimates. 
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•  A  measurement  model  describing  all  the  known  relations  between  the  measured 
quantities  and  some  additional  quantities  (if  needed)  for  which  no  prior  information 
is  available. 

The  output  of  the  method  consists  of 

•  An  adjusted  estimate  of  the  value  of  each  measured  quantity  and  an  estimate  of 
each  additional  quantity  introduced  in  the  measurement  model. 

•  The  covariance  matrix  of  all  these  estimates  from  which  the  standard  uncertainties 
and  correlation  coefficients  can  be  calculated. 

•  A  chi-square  value  which  is  a  measure  of  the  degree  of  consistency  between  the 
measurement  model,  the  input  estimates,  and  the  covariances  of  the  input  quant¬ 
ities. 

The  adjusted  estimate  of  the  value  of  a  measured  quantity  differs  from  the  input 
estimate  only  if  the  measurement  model  imposes  additional  information  regarding  the 
value  of  that  particular  quantity.  In  that  case  the  standard  uncertainty  of  the  adjusted 
estimate  will  be  smaller  than  the  standard  uncertainty  of  the  input  estimate.  For  a 
good  measurement,  the  difference  between  the  adjusted  estimate  and  the  input  estimate 
of  a  measured  quantity  should  not  be  large  compared  to  the  standard  uncertainty  of 
that  difference.  It  has  therefore  been  suggested  that  the  ratio  d  of  the  difference  to  its 
standard  uncertainty  is  calculated  and  assessed  against  a  selected  criterion,  e.g.  \d\  <  2. 
By  plotting  the  d  values  of  the  adjusted  estimates  it  is  possible  to  assess  whether  a  too 
high  chi-square  value  is  caused  by  a  few  poor  input  estimates  or  is  due  to  a  poor  model. 
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Abstract 

This  paper  gives  an  overview  of  the  similarities  and  differences  between  the  requirements 
and  techniques  used  in  mathematical  approximation  theory  and  filtration  in  surface  met¬ 
rology.  Although  the  two  fields  tend  to  use  the  same  or  similar  mathematical  objects  to 
produce  functions  that  simplify  a  function  in  a  controlled  manner,  it  is  the  way  that  this 
simplification  is  achieved  which  is  the  main  difference  between  the  two.  Approximation 
theory  uses  norms  to  judge  the  closeness  of  the  approximation  while  filtration  uses  the 
concept  of  wavelength  to  control  the  “smoothness”  of  the  result  of  filtration.  The  new  ISO 
definition  of  a  filter  is  stated,  together  with  a  generalisation  of  the  concept  of  wavelength 
through  “brickwall”  filters.  This  new  ISO  definition  of  a  filter  illustrates  the  closeness 
of  approximation  theory  and  filtration.  The  paper  then  proceeds  to  survey  some  recent 
developments  in  filtration  in  the  hope  that  there  can  be  some  cross-fertilisation  between 
approximation  theory  and  filtration.  These  include  wavelets,  robust  filters  and  non-linear 
filters  such  as  the  family  of  morphological  filters,  which  includes  envelope  filters  and  al¬ 
ternating  sequence  filters  (non-linear  multiresolution).  Examples  from  surface  texture  are 
used  throughout  the  paper. 


1  Introduction 

This  paper  gives  an  overview  of  the  similarities  and  differences  between  the  requirements 
and  techniques  used  in  mathematical  approximation  theory  and  filtration  in  surface 
metrology.  It  is  not  the  intention  of  this  paper  to  give  full  mathematical  detail  but 
to  survey  recent  developments  in  filtration  in  the  hope  that  there  can  be  some  cross¬ 
fertilisation  between  approximation  theory  and  filtration. 

Although  the  two  fields  tend  to  use  the  same  or  similar  mathematical  objects  to 
produce  functions  that  simplify  the  original  function  in  a  controlled  manner,  it  is  the 
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way  that  this  simplification  is  achieved  which  is  the  main  difference  between  the  two. 

Mathematical  approximation  theory  is  concerned  with  best  and  good  approximation 
of  a  large  family  of  functions  from  a  smaller  set  (usually  finitely  generated,  linear  or 
non-linear)  in  certain  normed  spaces  (such  as  Lp),  the  construction  of  good  approx- 
imants  (if  possible)  and  the  determination  of  approximation  order.  Classical  tools  to 
achieve  this  include  polynomial  tools  and  splines.  More  recent  tools  include  wavelets 
and  multiresolution  that  decompose  the  normed  spaces. 

Filtration  uses  the  concept  of  “wavelength”  to  control  the  “smoothness”  of  the  result 
of  filtration.  In  surface  metrology,  filtration  is  concerned  with  the  extraction  of  features 
within  a  prescribed  “wavelength”  band  defined  by  “wavelength  cut-offs” .  Classical  tools 
to  achieve  this  include  Gaussian  filters  [1],  polynomials  and  splines  [4].  Recently  there 
has  been  a  resurgence  of  activity,  both  fundamentally  and  practically,  in  filtration  for 
surface  metrology. 

The  International  Standards  Organisation  Technical  Committee  213  (ISO  TC/213), 
whose  remit  includes  surface  metrology,  has  recently  set  up  an  Advisory  Group  (AG9)  to 
explore  filtration  for  surface  metrology.  They  are  producing  a  series  of  technical  specifica¬ 
tions  (ISO/TS  16610  series  [2,  3,  4,  5,  6,  7,  8,  9,  10,  11])  to  standardise  filter  terminology 
and  to  introduce  to  industry  other  filtration  tools,  which  include  spline  wavelets  [5], 
morphological  filters  [9]  and  scale-space  techniques  [10]. 

Other  groups  are  also  producing  filtration  for  surface  metrology.  The  University  of 
Huddersfield  has  used  second  generation  wavelets  to  produce  an  improved  spline  wave¬ 
let  [12].  The  University  of  Hanover  is  exploring  robust  Gaussian  filtration  [6].  PTB  has 
developed  a  Robust  Spline  filter  [7].  The  rest  of  the  paper  surveys  some  of  the  results  of 
this  recent  activity. 

2  Basic  concepts  of  filtration 

This  section  is  a  summary  of  the  basic  concepts  of  filtration  as  given  in  ISO/TS  16610 
part  1  [2]. 

Let  V  be  the  space  of  real  surfaces. 

Let  Va  be  a  set  of  nested  subspaces  indexed  by  A  e  7 Z+  (here  7Z+  is  the  set  of  positive 
reals  which  includes  zero)  such  that 

VA  >  p  >  0;  Va  C  Vm  C  V  and  Vo  is  dense  on  V. 

The  nesting  index  A  a  number  indicating  the  relative  level  of  nesting  for  a  particular 
subspace  in  such  a  way  that  given  a  particular  nesting  index,  subspaces  with  lower  indices 
contain  more  surface  information  and  subspaces  with  higher  nesting  indices  contain  less 
surface  information.  By  convention,  as  the  nesting  index  approaches  zero  there  exists  a 
surface  in  that  indexed  subspace  that  approximates  the  real  surface  to  within  any  given 
measure  of  closeness  as  defined  by  a  suitable  norm.  Thus  approximation  theory  is  used 
to  define  Filtration.  The  usual  norm  used  in  filtration  is  L2  but  others  are  used  such  as 
the  one-sided  Chebychev  for  morphological  filters. 

Let  $a  •  V  — »  Va  be  a  projection  from  the  space  of  real  surfaces  onto  the  subspace 
indexed  by  A  >  0  which  satisfies  the  following  two  properties. 
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•  The  sieve  criterion:  VA ,fi>  0  and  Va  G  V\  $a($m(u))  =  $sup(A,A<)(a)* 

•  The  projection  criterion:  VA  >  0  and  Va  G  Va;  #a(u)  =  a* 

$a  is  called  the  brickwall  filter  (or  primary  mapping)  and  is  a  method  of  choosing  a 
particular  surface  belonging  to  a  subspace  with  a  specified  nesting  index,  to  represent 
the  real  surface,  which  satisfies  the  projection  and  sieve  criteria  [16]. 

The  sieve  criterion  allows  brickwall  filters  to  have  the  property  that  once  the  surface 
has  been  brickwall  filtered  at  a  particular  nesting  index,  subsequent  brickwall  filtering 
with  a  higher  nesting  index  will  produce  the  same  surface  as  brickwall  filtering  the 
original  surface  with  the  brickwall  filter  with  the  higher  nesting  index. 

The  projection  criterion  is  required  in  order  that  the  nesting  index  is  a  scale  or  size. 
For  define  the  set  operator  \I>a  :  V  — ’ ►  V  as 

VA  >  0  and  VP  C  P;  &\(P)  :=  {p  :  p  G  P  and  3>a(p)  =  p}- 

That  is  to  say  p  G  #a(P)  if  and  only  if  p  G  P  and  $a(p)  =  P-  Then  it  is  easily 
demonstrated  that  the  set  operator  ^a  is  a  granulometry  [16]  on  V  and  A  is  the  scale/size 
of  the  granulometry. 

Since  the  nesting  index  of  brickwall  filters  is  a  scale/size  and  it  satisfies  the  sieve 
criterion,  it  can  be  used  to  define  the  generalised  concept  of  wavelength.  An  example 
of  a  brickwall  filter  is  a  morphological  closing  filter  using  a  sphere  as  the  structuring 
element.  Here  the  nesting  index  is  the  radius  of  the  sphere. 

Other  filters  can  be  constructed  using  brickwall  filters  (e.g.  weighted  mean  of  brickwall 
filters,  supremum  of  brickwall  filters,  etc.). 

3  Wavelet  filters 

An  important  example  of  the  concepts  discussed  in  the  previous  section  is  wavelet  fil¬ 
tration.  The  multiresolution  form  of  the  wavelet  transform  consists  of  constructing  a 
ladder  of  smooth  approximations  to  the  profile.  The  first  rung  is  the  original  profile. 
Each  rung  in  the  ladder  consists  of  a  filter  bank  where  the  profile  A*  is  split  into  two 
components  giving,  a  smoother  version  Ai+i  of  the  profile  which  becomes  the  next  rung 
and  a  component  Di+\  that  is  the  “difference”  between  the  two  rungs. 

The  multiresolution  ladder  structure  lends  itself  naturally  to  a  set  of  nested  mathem¬ 
atical  models  of  the  profile,  with  the  zth  model  m*,  reconstructed  from  {D\,  D2 ,  D3,  . . . , 
D{ ,  Ai).  The  nesting  index  is  the  order  of  the  model,  the  higher  the  model  the  smoother 
the  representation  with  less  detail.  Thus  mi+i  is  a  smoother  version  of  the  profile  than 

mi  . 

As  part  of  a  research  programme  at  the  University  of  Huddersfield,  the  use  of  biortho- 
gonal  wavelets  for  surface  analysis  has  been  investigated  because  of  their  significant  mer¬ 
its  [12].  A  very  fast,  second-generation,  in-place  algorithm,  which  uses  the  lifting  scheme, 
has  been  developed  at  Bell  Laboratories  for  biorthogonal  wavelets  [13].  One  important 
property  of  biorthogonal  wavelets  is  that  they  allow  the  construction  of  symmetric  wave¬ 
lets  and  thus  linear  phase  filters  that  preserves  the  location  of  surface  features  with  far 
less  distortion  than  phase  shift  filters. 


Overview  of  approximation  theory  and  filtration  191 

Surface  texture  analysis  usually  breaks  down  a  surface  into  defined  wavelength  com¬ 
ponents  of  the  surface  called  roughness,  waviness  and  form.  There  are  many  well-known 
problems  with  the  current  standardised  filter  [14],  i.e.  Gaussian  filter  [1],  including  lost 
data  at  the  edges,  distortion  due  to  form,  retention  of  unwanted  wavelengths,  etc..  Hud¬ 
dersfield  has  investigated  the  possibility  of  using  a  ‘lifting  wavelet’  model  to  overcome 
some  of  these  problems  and  enhance  the  extraction  accuracy  for  roughness,  waviness  and 
form.  This  is  achieved  by  using  the  wavelet  transform  to  break  down  the  surface  into 
subsets  at  different  scales  and  recombining  only  those  subsets  of  the  scales  of  interest 
(i.e.  setting  all  the  other  subsets  to  zero  and  applying  the  inverse  wavelet  transform). 
Figure  1  shows  the  application  of  the  wavelet  filtering  technique  a  femoral  head  from  an 
artificial  hip  joint.  Full  details  of  the  particular  biorthogonal  wavelet  and  its  associated 
lifting  scheme  together  with  some  engineering  applications  are  given  in  reference  [12]. 


Fig.  1.  Metallic  femoral  head  showing  original,  reference  and  roughness  surfaces. 


4  Envelope  filters 

Traditional  linear  filters,  such  as  the  Gaussian  filter  [1],  produce  a  smoothed  mean  surface 
through  a  measured  surface.  Many  engineering  applications  of  functional  surfaces  involve 
mechanical  contact  where  the  envelope  of  the  surface  is  of  interest  rather  than  the  mean 
surface.  But  what  exactly  is  the  envelope  of  a  surface? 

The  following  are  defining  properties  of  the  envelope  of  a  surface  used  by  ISO  TC/213 
AG9  [8]: 

•  the  envelope  filter  must  be  Extensive,  i.e.,  VA,jF(A)  >  A, 

•  the  envelope  filter  must  be  Increasing,  i.e.,  A<  B  implies  F(A)  <  F(B ), 

•  the  envelope  filter  must  be  Idempotent,  i.e.,  F(F(A))  =  F (A), 

where  A,  B  are  surfaces  and  F(A)  is  the  filtered  surface  of  surface  A. 

But  these  are  also  the  defining  properties  of  a  morphological  closing  filter  [15];  hence 
all  envelope  filters  are  morphological  closing  filters.  A  morphological  closing  filter  using 
a  disk  as  the  structuring  element  is  illustrated  in  Figure  2. 

Unfortunately,  envelope  filters,  by  definition,  are  not  very  robust  to  outliers,  consisting 
of  large  spikes,  in  the  surface.  Scale-space  is  an  attempt  to  overcome  this  problem  with 
the  morphological  closing  filter. 
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Fig.  2.  An  envelope  filter  using  a  closing  filter  with  a  disk  as  a  structural  element. 

5  Scale-space 

Scale-space  is  a  way  of  breaking  down  a  signal  or  image  into  objects  of  different  scales. 
To  define  scale-space  we  need  to  define  the  size  of  objects  in  a  signal  or  image.  This  is 
achieved  using  Alternating  Sequence  Filters  [10]. 

Alternating  Sequence  Filters  (ASFs)  are  defined  in  terms  of  matched  pairs  of  closing 
and  opening  filters.  A  closing  followed  by  an  opening  both  at  a  given  scale  (radius  of  the 
circle,  length  of  the  horizontal  segment,  etc.)  will  eliminate  features  of  the  surface  whose 
“scales”  are  smaller  than  the  given  scale. 

ASFs  begin  by  eliminating  very  small  features,  then  eliminating  slightly  larger  fea¬ 
tures,  and  then  eliminating  slightly  larger  features  still  etc.,  in  a  systematic  way  up  to 
a  given  scale.  Usually  there  is  a  constant  ratio  between  successive  scales.  This  process 
produces  a  ladder  structure  similar  to  wavelet  analysis.  At  each  rung  in  the  ladder  the 
profile  is  filtered  by  a  matched  pair  of  closing  and  opening  filters  at  a  given  scale  to 
obtain  the  next  rung  profile  and  a  component  that  is  the  “difference”  between  the  two 
rungs.  The  ladder  structure  leads  to  a  multiresolution  analysis,  similar  to  wavelet  ana¬ 
lysis,  with  all  of  the  associated  analysis  techniques.  An  example  of  scale  space  of  a  profile 
from  a  ceramic  surface  is  given  in  Figure  3.  The  top  part  of  this  figure  shows  the  original 
non-smoothed  profile  with  the  final  smoothed  profile. 

6  Robustness 

Robustness  of  filtration  is  an  increasingly  important  area  of  interest  in  surface  metrology. 
Robustness  is  not  in  general  an  absolute  property  of  a  filter  but  a  relative  one.  One  can 
only  say  that  a  particular  filter  is  more  robust  than  an  alternative  filter  against  a  partic¬ 
ular  phenomenon  if  there  is  less  distortion  in  that  filter’s  response  to  that  phenomenon 
than  in  the  alternative  filter’s  response. 

To  make  robustness  an  absolute  property  of  filters  we  need  to  define  a  reference  class 
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Fig.  3.  Successively  smooth  profiles  of  a  ceramic  profile  using  an  ASF  with  a  disk. 


of  profile  filters  with  which  to  compare.  The  reference  class  of  filters  defined  in  ISO 
TC/213  AG9  is  the  class  of  linear  filters  [3].  Hence  by  this  definition  all  robust  filters 
must  be  non-linear.  There  are  several  well-known  techniques  (all  non-linear)  which  can 
produce  robust  filters  for  a  particular  phenomenon.  These  are  indicated  in  the  next 
sections. 

6.1  Metric  based 

Here  the  metric  used  to  fit  the  filter  to  the  surface  is  altered  to  a  more  “robust”  metric. 

For  example  the  metric  based  on  the  L\  norm  is  more  robust  against  spike  discon¬ 
tinuities  than  the  metric  based  on  the  least  square  norm  (L2  norm),  which  in  turn  is 
more  robust  then  the  metric  based  on  the  Chebychev  norm  (L^  norm). 

The  Robust  Spline  Filter  given  in  ISO/TS  16610  part  32  uses  an  L\  metric  rather 
than  the  usual  L2  norm  to  make  it  more  robust  [7] . 

6.2  Robust  statistics 

Here  each  point  on  the  surface  is  weighted  according  to  its  relative  height  position  to  the 
filter’s  smooth  response,  with  points  further  away  being  given  less  influence  on  the  filter 
response  than  points  nearer  in  height.  This  is  an  attempt  to  make  the  filter  more  robust 
against  spike  discontinuities.  There  are  several  standard  functions  used  to  allocate  the 
weights  to  points  (Huber,  Beaton  functions,  etc.)  which  can  be  found  in  any  standard 
book  on  robust  statistics  [17]. 
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The  Robust  Gaussian  regression  filter  given  in  ISO/TS  16610  part  31  uses  a  Beaton 
function  to  alter  the  influence  of  outliers  [6]. 

6.3  Pre-filtering 

Pre-filtering  is  a  technique  where  a  phenomenon  (such  as  spikes,  form,  etc.)  in  the  sur¬ 
face  are  removed  or  greatly  reduced,  by  other  means,  before  filtration,  thus  removing 
or  greatly  reducing  any  effect  the  phenomenon  can  have  on  the  filter’s  response.  This 
approach  has  the  advantage  that  once  a  method  has  been  found  to  remove  unwanted 
phenomenon  then  this  method  will  work  with  any  filter. 

Form  pre-filtering,  involving  removing  the  form  of  the  surface  before  filtration,  is  a 
very  common  technique  used  in  surface  metrology.  Less  common  is  using  scale  space 
pre-filtering  which  involves  removing  singularities  and  other  features  of  a  certain  size 
before  filtration. 

7  Conclusions 

The  paper  has  given  an  overview  of  the  similarities  and  differences  between  the  re¬ 
quirements  and  techniques  used  in  mathematical  approximation  theory  and  filtration  in 
surface  metrology.  Some  recent  work  on  filtration  has  been  reported.  It  is  hoped  that 
this  paper  can  generate  some  cross- fertilisation  between  the  two  areas  of  approximation 
theory  and  filtration. 
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Abstract 

In  this  paper  we  consider  an  application  of  Sobolev-orthogonal  functions  and  radial  basis 
function  to  the  numerical  solution  of  partial  differential  equations.  We  develop  the  funda¬ 
mentals  of  a  spectral  method,  present  examples  via  reaction-diffusion  partial  differential 
equations  and  discuss  briefly  some  links  with  theory  of  wavelets. 

1  Introduction 

Radial  basis  functions  are  a  well-known  and  useful  tool  for  functional  approximation  in 
one  or  more  dimensions.  The  general  form  of  approximations  is  always  a  linear  combin¬ 
ation  (finite  or  infinite)  number  of  shifts  of  a  single  function,  the  radial  basis  function. 
In  more  than  one  dimension,  this  function  is  made  rotationally  invariant  by  composing 
a  univariate  function,  usually  called  4 b  with  the  Euclidean  norm.  In  one  dimension  such 
approximation  usually  simplifies  to  univariate  polynomial  splines.  For  a  recent  review  of 
radial  basis  function  approximations,  see  [5]. 

This  note  is  about  applications  for  radial  basis  functions  and  other  approximation 
schemes  such  as  Sobolev-orthogonal  polynomials  and  more  general  Sobolev-orthogonal 
functions  to  the  numerical  solution  of  partial  differential  equations.  The  basic  ideas  stem 
from  the  theory  of  Sobolev-orthogonal  polynomials  ([13]),  and  in  this  paper  there  is  a 
remarkable  connection  developed  between  applications  of  Sobolev-orthogonality  with 
radial  basis  functions  (e.g.  [5]),  and  wavelets  are  mentioned  as  well  (e.g.  [8,  9]).  Sobolev- 
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orthogonal  polynomials  are  a  device  to  extend  the  standard  theory  of  orthogonal  polyno¬ 
mials  (see,  for  instance,  [12])  by  requiring  orthogonality  with  respect  to  non-selfadjoint 
inner  products  of  the.  form 

(f,9)\=  f  f{x)g(x)dx  +  \  f  f'(x)g'(x)dx 
J  a  J  a 

for  a  positive  parameter  A  and  a  suitable  interval  (a, 6),  a, b  G  EU  {±00}.  The  d:r  in 
the  two  integrals  is  often  replaced  by  more  general  Borel  measures,  dip,  say.  The  scheme 
which  we  want  to  discuss  in  this  short  article  is  one  of  spectral  type:  in  lieu  of  e.g. 
finite  element  spaces  as  underlying  piecewise  polynomial  approximation  spaces  for  the 
solution,  we  take  purpose-build  approximations  which  make  the  linear  systems  which  we 
need  to  solve  particularly  simple,  sometimes  even  diagonal. 

Therefore,  in  the  first  instance,  we  develop  a  theory  of  applying  Sobolev-orthogonal 
polynomial  basis  functions  for  the  numerical  solution  of  partial  differential  equations  via 
a  spectral  method.  Then  we  extend  this  idea  to  general  classes  of  radial  basis  function- 
type  methods,  where  shift-invariant  approximation  spaces  are  generated  with  Sobolev- 
orthogonal  basis  functions.  Due  to  the  introductory  character  of  this  paper,  our  dis¬ 
cussion  is  restricted  to  relatively  simple  cases.  Our  presentation  is  illustrated  with  the 
one-dimensional  reaction-diffusion  partial  differential  equation. 

This  is  the  place  to  note  that  radial  basis  functions  have  found  a  number  of  other 
applications  in  the  discretisation  of  PDEs.  Thus,  for  example,  Driscoll  and  Fornberg  [10] 
have  used  fast- converging  ‘flat5  multiquadrics  in  pseudospectral  methods,  while  Frank 
and  Reich  [11]  applied  radial  basis  functions  with  particle  methods  in  order  to  conserve 
enstrophy  in  the  solution  of  certain  shallow- water  equations.  Our  application  is  of  an 
altogether  different  nature. 

1.1  Examples  of  PDEs  and  Sobolev-orthogonality 

Consider  the  partial  differential  equation 

Ou 

—  =  V  (oVtt)  +  bu  -b  c,  (1.1) 

where  u  =  tz(x,  t)  is  of  sufficient  smoothness  with  respect  to  x  and  t,  x  is  given  in  a  cube 
V  C  3Rd  (more  generally,  in  a  finite  domain),  t  >  0,  a  —  a(x)  >  0,  b  =  fe(x)  and  c  =  c(x). 
We  impose  zero  Dirichlet  boundary  conditions.  The  stipulation  of  cube  as  a  domain  and 
zero  Dirichlet  conditions  is  unduly  restrictive,  but  it  will  suffice  for  the  short  presentation 
in  this  paper  and  adequately  illustrate  the  main  novel  concepts  in  our  presentation.  In 
the  next  section,  we  shall  also  introduce  a  nonlinearity  into  the  underlying  PDE. 

We  wish  to  approximate  the  solution  w(x,i)  as  a  finite  linear  combination  of  the 
generic  form 

m  ' 

“Cm) = 

1= 1 

where  t  is  nonnegatiye  and  x  resides  in  the  domain.  In  the  sequel  we  shall  also  use 
expansions  into  infinite  series  with  l  e  2Z.  Thus,  a  Galerkin  ansatz  (in  the  usual  L2  inner 
product  on  3Rd  which  we  denote  by  (*,•)  in  contrast  to  the  specialised  Sobolev-inner 
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product  (•,  -)a  above)  gives 

m  m  m 

y^Xa^ak)wi  =  5^(V(aVa/),afe)  wt  +  ^2(bai,ak)wi  +  (c,afe),  k  =  1,2, . . .  ,m. 
'i=l  z=i  i=l 

Integration  by  parts  in  the  second  term  above  and  substitution  of  the  requisite  zero 
boundary  conditions  yield  the  alternative  formulation 

m  m  m 

'^2(ai,ak)w'l=-'^2(aVai,Vak)wi  +  ^2(bai,ak)wi  +  (c,ak),  k  =  l,2,...,m. 


We  solve  the  ODE  system  (1.2)  with  respect  to  t,  for  example  with  the  backward  Euler 
scheme  (we  use  backward  Euler  for  the  sake  of  simplicity,  but  it  should  be  noted  that 
the  same  analysis  applies  to  any  implicit  multistep  method,  because  our  use  of  Sobolev- 
orthogonality  is  only  linked  to  the  implicitness  of  the  solution  method) 

wp1  =  <  +  AtF*(wn+1),  n  E  2Z+,  /  =  l,2,...,m,  (1.3) 

where  the  function  Fi  is  given  implicitly  by  the  equations  (1.2)  and  where  wn+1  in 
the  expression  above  is  the  vector  with  components  w™+1,  l  —  1,2, ...  ,m.  Let  us  now 
multiply  expression  (1.3)  by  (ai,ak)  and  sum  up  for  l  —  1,2,  Then,  exploiting 

(1.2),  a  little  algebra  yields 

a/(x)afc(x)dx  +  At  J  a(x)VTQi(x)Vafc(x)dx| 

.  ==  Y]  f  ai(x)ak(x)dxwii  +  f  c{x)ak(x) dx.  (1.4) 

^[Jv  Jv 

The  connection  with  Sobolev-inner  products  is  clear.  Indeed,  let  us  now  choose  the  set 
W m>n  :=  {wi,W2,  * . .  ,wm}  as  a  set  of  functions  that  are  orthogonal  with  respect  to  the 

o 

homogeneous  Sobolev  Hd, 2  inner  product  (see,  e.g.,  [13]) 

</.  9) At  :=/[!-  Af6(x)]/(x)fl(x)dx  +  At  f  o(x) VT/(x)V5(x)dx  (1.5) 
Jv  Jv 

(this  of  course  requires  that  A£6(x)  <  1,  hence  may  restrict  in  a  minor  way  the  choice  of 
the  time  step  At).  Further  below  we  shall  also  use  infinite  sets  W  instead  of  the  finite 
set  Wmi7l.  It  is  important  to  note  that  in  general  the  Sobolev  inner-product  depends 
upon  the  step  size.  Subject  to  this  formulation,  the  linear  system  (1.4)  diagonalises  and 
its  numerical  solution  becomes  trivial.  We  turn  now  to  a  more  elaborate  example  in  the 
next  subsection,  namely  the  reaction-diffusion  equation. 

1.2  Reaction-diffusion  as  a  paradigm  for  nonlinear  PDEs 

Let  us  consider  the  nonlinear  partial  differential  equation 

'  J  =  V(oVu) +  /(«),  (1.6) 
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where  otherwise  all  the  quantities  are  as  in  (1.1),  including  the  boundary  conditions. 
Suppose  that  an  approximation  un  to  u(x,  nAt)  is  available  at  all  the  spatial  grid  points. 
We  commence  by  interpolating  un  to  requisite  precision  by  some  function  v.  Thus,  v  is 
defined  throughout  the  cube  V  and  coincides  with  un  at  the  grid  points.  This  allows  us 
to  linearise  the  source  function  /  about  un,  the  outcome  being 


where 


Note  that 


—  =  V(aVu)  +  c  +  bu  +  g(u), 


6(x)  =  f'(v(x)), 

c(x)  =  f(v(x))  -  f'(v(x))v(x), 

g(x,u)  =  f(u)  -  f(v(x))  -  f(v(x))[u  -  v(x)]. 

g(x,u)  =  0{\u-v\2). 


We  can  now  solve  the  nonlinear  system  (1.7)  by  functional  iteration,  i.e.  by  letting  as  a 
start 

w?+h°=wV,  1  =  1,2 . m. 


and  recurring,  employing  the  inner  product  (1.5), 

m 

A,<+1'i+1 

1=1 

m  /  /  m 

=  Q/fe)W(n  +  (  g  (  • ,  ^2  aiw”+ 


k  =  1, 2, . . .  ,m, 


for  j  e  ZZ+. 


If,  as  in  the  previous  subsection,  we  choose  Wm  so  as  to  diagonalise  the  linear  sys¬ 
tem,  each  step  of  (1.8)  becomes  relatively  cheap.  Hence  this  approach  might  offer  a 
realistic  means  to  derive  spectral  approximation  to  nonlinear  PDEs.  Indeed,  a  special 
one-dimensional  case  can  be  treated  straightforwardly  and  it  is  presented  in  the  sequel. 

1.3  The  one-dimensional  case  using  polynomial  splines 

Let  (1.1)  be  given  in  one  space  dimension  and  without  source  terms,  whence  it  becomes 
the  familiar  diffusion  equation  with  variable  diffusion  coefficient, 

du  _  d  (  du\ 
dt  dxydx)' 

Thus,  provided  that  0  <  x  <  1  and  t  nonnegative,  we  require  the  ‘usual’  Sobolev 
orthogonality  [13]  with  respect  to  the  inner  product 

</,0>At  =  (f,g)i  =  f  f(x)g(x)d(p(x)+  [  f'(x)g'(x) dip(x)\ 

Jo  Jo 
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=  tb,  ^l=Ata. 

d.r  dx 

We  emphasise  again  the  dependence  of  the  Sobolev-inner  product  on  the  step  size.  Taking 
the  approach  of  the  previous  subsection  as  our  point  of  departure,  an  obvious  option  is 
to  use  Sobolev-orthogonal  polynomials.  An  alternative  approach  which  can  be  worked 
out  explicitly  and  which  we  wish  to  demonstrate  in  this  subsection,  is  to  use  univariate 
polynomial  spline  approximations.  It  has  the  advantage  of  being  more  amenable  to  a 
generalisation  to  several  space  dimensions. 

We  suppose  that  the  unit-interval  [0, 1]  is  divided  into  N  intervals  of  length  h  :=  jj 
and  consider  a  piecewise-quadratic  basis  of  continuous  functions  si,  S2, .  •  . ,  sjy  such  that 


*i(®)  •= 


< 


fc[x  -  (l  -  l)h]  +  at(x  -  lh)[x  -  (l-  l)h], 
£[(J  +  1  )h  -  x]  +f3i(x  -  lh)[x  -  (l  +  1)/?], 


0, 


(/  -  l)h  <x<lh , 
Ih  <  x  <  (l  -f  1  )h1 
\x  —  lh\  >  h. 


Clearly,  si  is  a  continuous,  C[0, 1]  cardinal  function  of  Lagrange  interpolation  at  the 
knots  (hence,  a  quadratic  spline  with  double  knots,  cf.,  Powell  [16],  the  added  degree  of 
freedom  taken  up  by  the  requirement  of  Sobolev-orthogonality).  Next,  we  need  just  to 
impose  Sobolev  orthogonality,  and  solve  for  the  coefficients  cq  and  $ .  This  is  equivalent 
to  the  requirement  that 


(shSi+i)&t  =  0,  l  =  1, 2, . . .  ,1V  -  1. 

In  the  special  case  a(x)  =  1,  b(x),c(x)  =  0,  we  have  <p(x)  =  x ,  x )  =  Atx  and 

{s(,Si+i)A*  =  J  [|  +ai+i(x-h)x\  ■  +  0tx(x  -  h)  dx 

+  At  J  (i  +  2at+ix  -  ai+ihj  (-1  4-  2/3, x  - /3thj  d.x 

=  h£[s  +  ai+ih2(Z-l)t;]  ■ 

At  f1 

— ~h  J  ^  -  °n+i)(l  +  A  “ 

=  h  [l  ~  h{ai+l + 0l) + 5oa,+i/?() + %  (_1 + r;+iA).  ‘ 


Let  0  =  A t/h2  be  the  Courant  number.  Since  we  have  two  degrees  of  freedom  for  each 
l  and  because  each  equation  is  otherweise  independent  of  l,  we  may  fix  a  =  cq  =  $. 
Then,  letting  a  :=  h2a,  requiring  (s/,s/+i)a#  =  0  is  equivalent  to 

5  —  5&  +  &2  +  lOf^cx2  —  30//  =  0  (1.9) 


or 

(10 n  -f  h4)a 2  —  5/i2a  +  5  —  30/i  =  0. 
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We  wish  to  solve  this  quadratic  equation  for  a  for  a  suitable  range  of  Courant  numbers. 
Indeed,  the  equation  (1.9)  has  two  real  solutions  a  for  every  p,  >  ~  if  h  is  small  enough, 
since  its  discriminant  is 

(120 p  +  5 )hA  +  1200 fj2  -  200 p. 

In  the  case  M  “  |  each  si  reduces,  upon  the  choice  of  a  =  0,  to  a  chapeau  function. 
Otherwise  we  obtain  a  .=  0(1).  We  may  give  up  a  small  support,  characteristic  of 
spline  functions  (which,  anyway,  is  of  marginal  importance,  since  we  do  not  solve  linear 
systems!).  This  is  a  case  discussed  in  the  next  section.  Another  obvious  alternative  is  to 
construct  an  orthogonal  basis  from  chapeau  functions.  This,  however,  is  easily  seen  to 
be  identical  to  the  LU  factorization  of  the  standard  FEM  matrix 

|  |  0  0  0  •••  0 

|  |  i  0  0  •  •  •  0 

0  ?  i  I  o  o 

u  6  3  6  u  u 

0  •••  0  i  2  1  0 

u  v  u  6  3  6 

0  0  0  0  I  I 


2  Applications  of  radial  basis  functions  and  wavelets 

2.1  Sobolev-orthogonal  translates  of  a  radial  basis  function 

In  this  section,  we  wish  to  develop  a  more  general  approach  employing  the  concepts 
of  wavelets  and  radial  basis  functions  and  employ  shift-invariant  spaces  of  approxima¬ 
tions  for  our  spectral  methods.  We  begin  by  giving  up  the  compactness  of  the  domain 
V  and  work  on  the  entire  real  line  instead.  For  this,  we  shall  demonstrate  the  use  of 
Sobolev-inner  products  and  shift-invariant  spaces  and  concentrate  solely  on  this  part 
of  the  analysis  in  the  present  article.  So,  in  particular,  the  set  W  above  is  of  the  form 
{</>(•  —  nh)  |  n  6  Z£}.  In  the  sequel  we  shall  add  several  remarks  about  how  to  find 
compactly-supported  0  that  allow  the  treatment  of  partial  differential  equations  on  com¬ 
pact  domains.  We  remark  that  n  is  no  longer  used  for  the  time-steps  in  the  differential 
equation  solver  but  for  the  shifts  of  the  radial  functions. 

To  start  with,  we  wish  to  find  a  function  <j)  €  H2(IR),  where  H2(IR)  is  a  non- 
homogeneous  Sobolev  space,  such  that  for  a  positive  constant  A  and  positive  spacing 
h  it  is  true  that 

/OO  nOO 

(j){x)(j){x  -  hn)  dz  +  A  /  (j)' (x)<j)' (x  -  hn)  dx  =  S0n,  neTL.  (2.1) 

-OO  J  —  OO 

We  multiply  both  left-  and  right-hand-side  of  the  general  pattern  (2.1)  by  exp(i0n)  and 
sum  over  n  e  ZZ, 


°°  f  f°°  r°°  ^ 

y:  exp(i#n)  <  /  <p(x)(j)(x  -  hn)  dx  +  A  /  <j>' (x)<j)' (x  -  hn)  dx  >  =  1,  0  e  [- 

,  — W— OO  J  —  OO  J 
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In  order  to  be  able  to  exchange  summation  and  integration  and  apply  the  Poisson  sum¬ 
mation  formula  (Stein  and  Weiss  [17],  p.  252)  we  make  a  number  of  assumptions.  The 
version  of  the  Poisson  summation  formula  that  we  wish  to  use  states  that  for  a  univariate 
function  /  with 

|/(*)|  =  o((i  +  |*|)-‘-e) 


|/»|  =  o((i  +  W)-1-£) 


and  positive  e,  the  following  equality  holds  (note  that  the  first  bound  in  the  above  implies 
existence  and  continuity  of  the  one-dimensional  Fourier  transform) 

.00  00 

Y  f(6  +  2-rrj)  =  Y  exp(i Oj)f(j). 

j—— 00  j=—o o 

Specifically,  we  assume  that  the  following  three  decay  estimates  hold: 

I0(z)l  <  c(l  + 

1^)1  <c(i  +  N)-1-£, 


10(01  <c(l  +  |^|)-3-£, 

where  c  is  a  generic  positive  constant,  e  >  0,  <j>  denotes  the  Fourier  transform  and  we 
demand  the  faster  rate  of  decay  in  the  last  display  because  we  shall  later  require  summab- 
ility  of  translates  of  the  Fourier  transform  multiplied  by  the  square  of  its  argument.  Note 
in  particular  that  the  first  decay  condition  renders  the  Fourier  transform  <£  continuous 
and  well  defined. 

An  example  for  a  function  (j)  that  satisfies  the  three  decay  conditions  above  is  the 
second  divided  difference  of  the  multiquadric  radial  basis  function  [4]  \/r2  +  C2  that  is 


,2  +  7j\/(z+l  )2  +  C2' 


Here,  C  is  a  positive  constant  parameter.  The  above  function  decays  cubically  [4]  and 
its  Fourier  transform  even  decays  exponentially  due  to  the  exponential  decay  of  the 
modified  Bessel  function  K\  [1]  that  features  in  the  generalised  Fourier  transform  of  the 
multiquadric,  here  stated  only  in  the  one-dimensional  case, 

,MC\( ) 

~2Cmr 

(cf.  Jones  [14]). 

Once  summation  and  integration  are  interchanged,  (2.2)  becomes 


/oo  00 

0(»)  Y,  exp(i On)<l)(x  —  hn)  dx 

•00 


Sobolev- orthogonal  functions,  radial  basis  functions 


/GO  00 

<pf(x)  E  exp(i0n)0'(z  —  hn)  dx  =  1,  6  £  [— ■ 7r,  7r] ,  (2.3) 

-00  n=— co 

or,  applying  the  Poisson  Summation  Formula  (Stein  and  Weiss,  [17],  p.  252) 


/oo  00 

0(a)  ^  expf  i/z-1a(0 

-00  _ ' 


4-  27m)  1 01  h 


x( 6  4  27m)^ 


do;  4  iAh' 


n=— 00 
»oo 


x  f  (pf(x)  ^  expfih  *#(0  4  27mA  (0  4  27rn)0Ai  *(0  4  27mA  da  =  h, 
J-oc  n=- 00  V  7  A  y 

where  0  6  [— 7r,  7r],  Because  0  vanishes  at  infinity,  integration  by  parts  of  the  second  term 
of  (2.4)  gives 

/  0(a)  ^  exp^ih-1a(0  4  27rn)^0^/i_1(0  4  27m)^  da 

tZ—o0  n— — oo 

A  /*°°  00 

4  ^  /  0(a)  ^  exp(i/i-1a(0  4  27m))(0  4  27m)20(/f~1(0  4  27rn))da 

v/— 00  n=— 00 

00 

=  ^  0(h_1(0  4  27m))0^— h_1(0  4  27rn)^l  4  Ah“2(0  4  27rn)2j  =h. 

n=— 00 

Since  0  is  real,  0(— f)  =  0(f),  and  this  implies 

00 

^  |0(h_1(0  4  27rn))|2(l  4  A/i“2(0  4  27rn)2)  =  h,  0e[-7r,7r].  (2.5) 

n=— 00 

This  is  our  condition  that  leads  to  the  required  Sobolev-orthogonality.  In  summary,  we 
have  established  the  following  theorem. 

Theorem  2.1  If  the  decay  conditions  on  0,  as  stated  above,  hold  in  tandem  with  the 
expression  (2.5),  then  the  required  orthogonality  condition  (2.1)  is  satisfied. 

We  note  that,  if  we  are  given  a  0  such  that 


(9  +  27 to) 

.  2 

)  =/l,  0€[-7r,7r], 

(2.6) 

<4)  :=  - 
1 

0(f) 

yi  +  A$2 

(2.7) 

satisfies  (2.5).  This  expression  can  be  used  to  derive  an  explicit  transformation  which 
takes  a  0  that  satisfies  (2.6),  into  a  0  satisfying  (2.5),  although  its  practical  computation 
may  be  nontrivial.  Indeed,  by  the  Parseval-Plancherel  theorem  [17],  we  get  the  useful 
identity 


0A 0  =  00 x  ~  V)K 0  d2A 
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which  is  a  convolution  and  whose  Fourier  transform  is  therefore  (2.7)  (cf.,  for  instance, 
Jones  [14]).  In  (2.8),  K0  is  the  0th  modified  Bessel  function  (Abramowitz  and  Ste- 
gun  [1])  which  is  positive  on  positive  reals  and  satisfies  Ko(t)  ~  -  log  t  near  zero  and 
K0(t)  A/7r/(2£)e~f  for  large  t,  similar  to  the  asymptotics  We  have  used  before  for  the 

Ki  modified  Bessel  function.  Hence,  by  a  lemma  in  [7],  see  also  (Light  and  Cheney  [15]) 
0  decays  algebraically  of  a  certain  order  if  ij)  does.  Moreover,  because  l/y/l  +  \x2  is 
positive,  integer  translates  of  <p  are  dense  in  L2,  say,  provided  that  this  is  the  case  with 
integer  translates  of  ip  [18]. 

In  some  trivial  cases  we  may  evaluate  the  integral  (2.8)  explicitly,  for  instance  for 
ip(x)  =  cosx,  where  the  integral  is  again  a  constant  multiple  of  the  cosine  function 
(Abramowitz  and  Stegun  [1]).  Otherwise,  the  smoothness  and  fast  exponential  decay  of 
the  modified  Bessel  function  can  be  used  together  with  a  quadrature  formula. 

We  may  now  use  the  translates  of  such  Sobolev-orthogonal  functions  in  the  spectral 
approximation  of  a  PDE  as  above,  letting  W  :=  {0(-  -  nh)  |  n  €  2Z}. 

An  example  of  a  function  ip  that  satisfies  (2.5)  is  simply  the  characteristic  function 
scaled  by  h  of  the  interval  [~hir,hTr].  In  that  case,  |^(x)|  decays  like  l/|x|.  In  fact,  any 
ip  that  satisfies  |x/>(£)|  <  c(l  +  |C|)_1^2”e  for  positive  e  can  be  made  to  satisfy  (2.6)  by 
subjecting  it  to  the  transformation 

(2.9) 


see  for  instance  (Battle  [2]).  If  ip  is  compactly  supported  then  the  transformed  y'j  will 
not  necessarily  be  compact  supported  but  decay  exponentially  [6] . 

In  order  to  find  a  class  of  examples  of  compactly  supported  ip  that  satisfy  (2.6),  see 
Daubechies  [8]  for  her  compactly  supported  scaling  functions  ip  which  are  fundamental 
for  the  construction  of  Daubechies  wavelets.  For  example,  the  following  conditions  are 
sufficient  for  ip  which  shall  be  defined  by  its  Fourier  transform  to  satisfy  (2.6)  for  h  =  1 
(other  h  can  be  used  by  scaling): 

3- 1 

where,  for  some  suitable  coefficients  hk, 

2N—1 

HO  =  E  He-lk* 

fc=0 

has  to  satisfy  h( 0)  =  1,  h( 7r)  =  0,  and 

IMOI2  +  IM£  +  7r)l2  =  1>  £e[-7r,4 

For  the  construction  of  such  h)  see  [8] .  Compactly  supported  basis  functions  are  import¬ 
ant  to  approximate  the  numerical  solution  of  a  PDE  as  in  the  above  example  defined  on 
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a  compact  V.  Moreover,  any  ip  with  the  aforementioned  decay  property  can  be  made  to 
satisfy  (2.5)  by  the  transformation 


oo 

iVKf  +  h~l2im)\2(l  +  A(£  +  /i-127rn)2 


(2.10) 


They  can  also  be  found  by  applying  the  transformation  (2.10)  and  using  the  transform¬ 
ation  (2.9)  as  well. 

We  note  finally,  that  for  instance,  when  ^  is  a  B-spline  then  its  translates  are  dense 
in  L 2  if  we  allow  h  to  become  arbitrarily  small  (see,  for  instance,  Powell  [16])  and  the 
last  section  of  this  paper). 

2.2  Sobolev-orthogonal  translates  of  a  function  in  higher  dimensions 

Applying  the  approach  of  the  previous  subsection  to  the  Sobolev  inner  product 


/  /(x)p(x)dx  + A  /  Vr/(x)Vg(x) dx, 
J  Rd  Jnd 


the  outcome  is  the  orthogonality  condition 

53  \4>(h-1(e  +  2TTn))\2(l  +  \h-2\\0  +  2im\\2)  =  hd,  e  [-7r,7r]rf,  (2.11) 


which  replaces  (2.5).  We  are  now  also  interested  in  the  more  general  case  of  Sobolev-type 
inner  products 

f  /(x)g(x)Mx)  dx  +  A  f  VT/(x)Vg(x)i/(x)  dx, 

J  nd  Jnd 

where  the  weights  fi  and  v  are  positive.  Here  the  orthogonality  condition  becomes  more 
complicated.  Specifically,  it  is 

^  <j>tX(h~l(Q  +  2nn)^^(h~l(9 +  27rn)^j 

n  e7Ld 

+  \hr2<j>y(h~*(Q  +  27rn)^<£z/^/i-1(0  -f  27rn)^  =  hd,  0  €  [-7T,  7r]d, 

where 

<i>»  :=  <£*v7b 

4>v  ■=  (II  ■  ||  X  4>)  *  y/v, 

and  *  denotes  continuous  convolution,  used  as  in  (2.8),  where  'ip  is  convolved  with  a 
modified  Bessel  function. 

2.3  Error  estimates 

We  can  offer  error  estimates  for  the  Sobolev-orthogonal  bases,  firstly,  in  the  case  when 
0  is  a  univariate  spline  of  fixed  degree  m,  say,  with  knots  on  hTL ,  and,  secondly,  in  the 


M.  Buhmann,  A .  Iserles ,  and  S.  P.  N0rsett 


case  when  0  is  a  linear  combination  of  translates  of  the  radial  Gauss  kernel 

e-“V/2,  x  €  R, 

along  /iZ£.  In  the  former  case  it  is  known  that  the  uniform  approximation  error  to  a 
sufficiently  smooth  function  from  the  linear  space  spanned  by  <£(•  —  nh ),  n  G  2Z,  is  at 
most  a  constant  multiple  of  hm+l  ([16]).  We  have  already  mentioned  that  we  require 
A  =  0(/i2),  therefore  it  can  be  deduced  by  twofold  integration  by  parts  that  the  Sobolev 
error  is  indeed  0(ftm+1).  This  can  be  generalized  in  a  straightforward  way  to  higher 
dimensions  by  tensor-product  B-splines. 

Our  L2(1R)  error  estimates  can  be  carried  out  as  follows:  Let  /  be  a  band-limited 
function,  that  is,  one  with  a  compactly-supported  Fourier  transform,  which  satisfies 
such  assumptions  that  imply  that  the  best  least-squares  approximation  using  a  Sobolev 
inner  product 

oo 

si i(x)=  r,  (/,  </>(■  —  /tn))x  h<t>(x  —  nh),  xeH,  (2.12) 

71  —  —  OO 

is  well  defined.  For  instance,  we  may  require  that  (/,/) \}h  <  oo,  as  well  as  sufficient 
decay  of  the  radial  basis  function  </>,  i.e. 

|<£(r)|  <  c(l  + |r|)_1_£, 

\<k'(r)\  <  e(l  + lr-1)-1— , 

#(r)|  <  c(l  +  |r|)_,_f 

for  a  positive  e.  Here  {*,  -)x,h  is  the  Sobolev  inner  product  which  we  study  in  this  note 
and  it  is  helpful  to  emphasise  its  dependence  on  h  in  the  subscript.  We  begin  with  the 
piecewise  polynomial,  i.e.  spline,  case.  Hence,  let  <f>  be  from  the  space  of  splines  of  degree 
m  with  knots  on  hTL  such  that  its  translates  are  Sobolev  orthogonal. 

Theorem  2.2  Subject  to  the  assumptions  of  the  last  paragraph,  we  have  the  error 
estimate 

IK  -  /||2  =  0(/im+1),  h-+0.  (213) 

Proof:  We  shall  establish  in  the  course  of  this  proof  an  error  estimate  for  the  first 
derivative  of  the  error  function  in  (2.13),  so  that  an  order  of  convergence  can  also  be 
concluded  for  the  norm  associated  with  our  Sobolev  inner  product.  Indeed,  because  the 
Fourier  transform  is  an  L2(1R)  isometry,  we  may  prove  (2.13)  by  considering 

]\h-fh  (2-14) 

instead  of  the  left-hand  side  of  (2.13).  The  Fourier  transform  of  (2.12)  is 

oo 

h(0)=  ]T  '(f,4>(--nh))Xihe-mhke),  9  €  R. 

n=— oo 

The  absolute  convergence  of  the  above  is  guaranteed  by  the  decay  conditions  on  <f>. 
Hence  the  square  of  (2.14)  is,  by  the  Parseval-Plancherel  Formula  and  periodisation  of 
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the  integrand  with  respect  to  0, 

/oo  oo  2 

/»-  E  (/,<!>(■ -nh))Khe-mn4>(e)  d0 

■°°  n=—oc 

/oo  |  00  /•oo  2 

/>)-  E  /  /(0^)e‘€hn(l  +  ^a)d€e-,*to^)  d(9 

n=—oo  d —00 

/n/h  00  ^  I 

E  /(^  +  2irfc/A)  —  (j>(6  +  2wk/h) 

■”/hk=- do  I 

00  „oc  2 

x  E  /  /(e)^«)ei^ft(l  +  Ae2)d?e-ifeh  <M.  (2.15) 

n=-ooJ-°° 

The  (1  +  A£2)  term  in  the  above  comes  from  the  derivative  in  the  Sobolev  inner  product 
and  Fourier  transform.  Because  f  is  band-limited,  for  small  enough  h  (2.15)  assumes  the 
form 

/ir/h  00  00  .00  2 

E  msok-ko+^/h)  e  /  /(o^(0^n,,(i+Ae2)d^e-ie"h  <w. 

~rc/hk=— 00  n— — 00  d —00 

(2.16) 

Using  again  the  band  limitedness  of  /,  together  with  the  Poisson  Summation  Formula, 
(2.16)  can  be  brought  into  the  form 


/7T/fc  OO  - 

Y)  /(*)*)*-  4>{e  +  2*k/h) 

-*/hk=- 00  1 

1  °°  2 

x  -  y  /(0  +  27rn/h)(f>(6  +  2irn/h)  ^1  +  A(0  +  2im/h)2^j  dO 

n=— 00 

/7r/h  00 

E  |/W^(tt'-fc~1^  +  2ir*/h)/(fl)^)(l  +  Afla)|a<W.  (2.17) 

-*/hk=- OO 

In  the  case  when  </>  is  in  the  aforementioned  spline  space,  it  can  be  expressed  as  the 
inverse  Fourier  transform  of 

m  =  -r  - ^(e)  (eR,  ■  (2-18) 

vfcSUoo  l*(*  +  A-^Jrn)!^!  +  +  ^2™)*) 

where  f(£)  =  £_m_1.  This  follows  from  (2.5)  and  from  the  fact  that  all  splines  from  our 
space  are  linear  combinations  of  integer  translates  of  r(x)  :=  \x\m ,  whose  generalised 
Fourier  transform  is  a  multiple  of  £“m“1  [14].  Since  any  constant  factors  in  front  of 
the  function  £”m~1  in  r  cancel  in  the  expression  for  (j)  above,  we  have  ignored  them 
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straightaway.  Substituting  (2.18)  into  (2.17),  we  get  the  integral  over  [— 7r//i,  7r/h]  of 

I  |2 


E 


msok 


oo 


r(6  +  h  12nk)r(0) 


/>)(i  +  xe2) 


■J]  | f(0  +  h  127m)\2(l  +  \(Q  +  h  l2im)2) 

V  —  —  OO 


(2.19) 


Considering  (2.19)  for  each  m  separately,  it  follows  from  (2.19)  and  from  f(£)  =  £~m"1 
that  our  claim  is  true.  Indeed  for  the  sum  over  all  terms  with  k  ^  0,  it  is  evident  that 
we  obtain  a  factor  of  /i2m+2  from  the  numerator,  because  the  denominator  is  periodic, 
containing  one  term  independent  of  /i,  and  the  nonvanishing  expression  h~l2nk  in  the 
argument  of  r(6  +  h~l27rk)  guarantees  r(6  +  /i-127rA;)  ~  /im+1  due  to  r(£)  =  Of 

course,  the  squares  then  taken  provide  the  /i2m+2  instead  of 
On  the  other  hand,  for  A:  =  0,  we  have  for  small  enough  h 


1^(0  +  h~l27m)\2(  1  +  A  (0  +  h~127m)2) 

Yln^o  1^(0  +  h~l2im)\2(l  +  A (9  +  h~12im)2) 

1  +  o  W  +  h~l27rn)\2(l  +  A (0  +  h-12im)2) 


which  is  also  0(/i2m+2),  as  required,  because  the  numerator  provides  an  0(h2m),  ac¬ 
cording  to  the  rate  of  the  decay  of  f  and  the  power  of  h  in  its  argument.  This  is  then 
squared  to  provide  0(hAm)  =  0(h2m+2). 

As  for  the  derivatives,  one  only  has  to  multiply  the  Fourier  transform  of  the  error 
function  in  (2.14)  with  0,  and  we  get  the  same  error  estimate  by  multiplying  the  integ¬ 
rands  in  all  the  following  integrals  with  |0|2.  D 


The  same  analysis  remains  valid  when  considering  integer  translates  of  the  Gauss 
kernel  e“7  x  /2  in  order  to  form  <j>.  In  this  case  we  make  use  of  the  fact  that  the  Gauss 
kernel  has  a  Fourier  transform  which  is  a  multiple  of  e“x  ^27  \  We  put  this  instead  of 
f  into  (2.19),  and  we  then  get  arbitrarily-high  orders  of  convergence  from  (2.14)  as  long 
as  we  take  7  =  0(h) ,  see  also  [3].  For  this  choice  <f>  is  exponentially  decaying,  whereas 
for  splines  of  degree  m  we  merely  get  algebraic  decay  at  infinity  of  order  —  m  —  1. 
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Abstract 

R.  M.  Lewitt  has  introduced  a  family  of  compactly  supported  radial  basis  functions 
which  are  particularly  useful  in  discretising  for  inversion  ill-posed  problems  involving  line 
integrals.  We  consider  some  practical  considerations  in  their  use  and  implementation, 
compare  square  and  triangular  grids  of  the  functions  in  two  dimensions,  and  describe 
some  particularly  favourable  choices  of  the  defining  parameters. 


1  Introduction 

In  the  article  [5],  R.  M.  Lewitt  introduced  a  family  of  window  functions 

^(r)  =  j  (1  -  (r/a)2)™/2/m(a(  1  -  (r/a)2)1/2)//m(a),  0  J  r  <  a,  (1J) 

where  Im  is  the  modified  Bessel  function  of  order  m  (see  Ch.  Ill,  3.7  [13]).  The  implicit 
dependence  of  rj)  on  the  parameters  a  >  0,  a  >  0  and  m  £  N  is  discussed  below.  Lewitt’s 
motivation  for  studying  these  functions  is  the  use  of  translates  of  the  radially  symmetric 
function 

»(*)  =  tf(IMI)  (*eRd) 

(see  Figure  1)  as  a  basis  for  the  discretisation  of  tomographic  problems  [8,  9].  Such  a  basis 
overcomes  a  number  of  difficulties  associated  with  the  usual,  pixel-based,  representation 
in  problems  involving  the  recovery  of  function  from  a  set  of  line,  curve  or  strip  integrals 
across  its  domain,  while  retaining  the  advantage  of  a  sparse  discretisation.  The  author’s 
interest  in  these  functions  arises  in  their  application  to  a  Radon-like  problem  in  the 
remote  sensing  of  ocean  waves  [15],  a  detailed  exposition  of  which  may  be  found  in  [3]. 

2  Discretising  x-ray  problems 

The  discretisation  of  an  x-ray  transform  inversion  problem  with  Lewitt ’s  basis  is  straight¬ 
forward.  Given  a  set  of  centres  X{  €  Rd,  one  represents  the  (unknown)  function  /  as  a 
linear  combination  of  the  translates  of  \£, 

/(*)  =  £&®(*-*0  (2.1) 
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Fig.  1.  Lewitt’s  radial  basis  function  in  dimension  2  with  m  =  2,  a  =  3. 

The  given  data  in  such  problems  are  the  values  Ij  of  integrals  of  /  over  lines  (or  more 
generally,  submanifolds)  Lj 

Ij  =  [  /(*)=5>  /  «(*-*<).  (2-2) 

The  latter  integral  in  (2.2)  is  the  projection  or  Abel  transform  of  /,  which  can  be  calcu¬ 
lated  explicitly  in  the  linear  case.  For  a  line  Lj  whose  closest  point  to  x\  is  at  a  distance 
s  from  it,  and  with  the  dependence  of  'ip  on  rh  here  made  explicit, 

2J0  ^m( Vs2  +  ~P)dt  =  a  1  (_.)  4+i/2(s) 

(see  A7,  [5]).  Thus  (2.2)  reduces  to  a  linear  system  which  may  be  solved  for  the  coefficients 
£i.  If  the  support  of  the  basis  functions  is  small  (i.  e.,  if  a  is  small)  then  this  linear  system 
has  an  unstructured  sparsity  which  can  be  exploited  by,  for  example,  an  iterative  row- 
action  solution  method  [2]. 

The  computational  cost  of  such  a  discretisation  lies  mainly  in  the  evaluation  of  the 
Abel  transform  which  requires  the  calculation  of  a  Bessel  function.  Fortunately,  Bessel 
functions  of  half-integer  order  can  be  calculated  efficiently  from  their  recurrence  relations 
(see  the  Atlas,  [12],  for  details). 

The  discretisation  techniques  describe  here  can  also  be  applied  to  problems  in  which 
the  integrals  are  over  curves  of  sufficient  smoothness  to  allow  a  local  linear  approxima¬ 
tion. 
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3  Fourier  transform  and  invertibility 


The  Fourier  transform  of  the  d-dimensional  basis  function  is  radially  symmetric  and 
given  in  (A3)  of  [5]  as 


adQm(27r)d/2  Jm+d/2{z) 

Im{a)  z' m+d/2  ’  Z 


\/(27ra||a;||)2  -  a2. 


(3.1) 


The  presence  of  the  Bessel  function  2(2)  in  this  expression  clearly  implies  that  it  is 
not  non-negative,  and  so  by  Bochner’s  characterisation  of  positive  definite  translation- 
invariant  functions,  $  is  not  positive  definite  for  any  choices  of  the  parameters. 

This  fact  denies  us  the  attractive  approximation  theory  of  the  compactly  supported 
radial  functions  of  Wu,  Wendland  and  Buhmann  (Section  3,  [1]).  In  particular,  there  is 
no  guarantee,  per  se ,  on  the  invertibility  of  the  interpolation  matrix  [$(xi  —  Xj )] ,  needed 
to  ensure  that  (2.1)  can  represent  an  arbitrary  function  at  its  centres.  However,  this 
interpolation  matrix  is  invertible  if  it  is  strictly  diagonally  dominant  (Corollary  5.6.17, 
[4])  which,  for  a  set  of  centres  on  a  uniform  grid  F,  holds  if 


*(0)>  £  *(x).  (3.2) 

*er\{o} 


Values  of  the  parameters  for  which  (3.2)  is  satisfied  for  the  square  planar  grid  AZ2  are 
shown  in  Figure  2. 

As  is  noted  in  [5],  there  are  several  reasons  why  a  rapid  decay  of  the  Fourier  transform 
of  the  basis  function  is  advantageous  in  functional  representation  for  the  inversion  of  x- 
ray  and  related  transforms. 

•  Such  inversions  may  be  complicated  by  functions  in  the  nullspace  of  the  transform, 
so-called  ghosts.  For  some  transforms  [7]  it  can  be  shown  that  such  functions  have 
a  Fourier  transform  which  is  small  close  to  zero  in  the  frequency  domain,  and  so 
representation  by  a  basis  with  Fourier  transform  localised  around  zero  will  suppress 
these  ghosts. 

•  These  inversions  are  often  ill-posed  and  the  given  data  noisy.  Representation  of  the 
sought  function  by  a  basis  with  localised  Fourier  transform  imposes  smoothness, 
and  so  acts  to  regularise  the  problem  in  the  sense  of  Tikhonov. 

•  It  is  often  convenient  to  sample  the  inverted  function  on  a  grid  which  differs  from 
the  set  of  centres  X{  of  the  basis.  With  a  localised  Fourier  transform,  such  a  sampling 
can  be  performed  without  significant  aliasing. 

The  asymptotic  estimate  &m(x)  =  0(l/||x||m+^+1^2)  may  be  derived  from  (3.1)  and 
estimates  Im  with  large  argument  (see  Eq.  A4,  [5]),  a  fact  which  should  inform  our  choice 
of  m. 

4  Choice  of  parameters 

One  agreeable  feature  of  Lewitt’s  radial  functions  is  that  the  choice  of  parameters  of 
the  functions  correspond  in  a  natural  way  to  the  balance  between  representation  quality 
and  efficiency  of  computation.  For  example,  the  asymptotic  rate  of  decay  of  the  Fourier 
transform  increases  with  m  (see  above),  but  so  does  the  cost  of  the  calculation  of  Im. 
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A  similar  choice  arises  when  the  centres  lie  on  a  uniform  (square  or  triangular)  grid 
T.  Let  A  denote  the  grid  spacing  of  such  a  grid,  i.  e.,  the  minimum  distance  between 
distinct  centres  in  T.  It  is  desirable  that  the  grid  ratio  a/ A  be  small,  as  this  results  in 
sparsity  of  the  discretisation.  As  a  guide  to  fixing  the  values  of  a  and  the  grid  ratio, 
Lewitt  suggests  the  error  in  quasi  interpolation  to  a  constant ,  the  error  with  which  the 
function 

9(x) 

isr 

approximates  the  function  whose  constant  value  is  that  of  g  at  the  centres  (edge  effects 
are  ignored  here).  In  Figure  2  the  root  mean  square  of  this  representation  error  (estimated 
numerically)  is  shown  for  the  square  planar  grid,  m  =  2  and  a  range  of  values  of  a  and 
aj A.  The  distinctive  “trenches”  in  the  error  can  be  explained  with  Poisson  summation 
formula  (see  [11]), 

^(j:  +  An)  =  “  £  exp(27rfn.x/A)*(n/A).  (4.1) 

n£ Z2  n€ z2 

The  summand  for  n  =  0  in  the  second  sum  is  #(0),  so  the  representation  error  depends 
only  on  the  values  of  &  on  the  dual  grid ,  Z2/A.  Provided  that  &  decays  rapidly,  we 
would  expect  a  small  error  when  i'  is  zero,  or  close  to  zero,  for  the  dual  grid-nodes  close 
to  the  origin. 

By  (3.1),  ^(x)  is  zero  exactly  when 

Jm+d/2(V(2™\ Ml)2  “  <*2)  = 

i.  e.,  for  radial  values  ||x||  —  Rky 

Rk  =  2 = 

where  rjk  is  the  k- th  zero  of  Jm+d/ 2-  Thus,  the  requirement  that  that  the  fc-th  zero  of 
'L(x)  occurs  at  the  radius  of  the  closest  non-zero  dual  grid  node  (i.e.,  Rk  =  1/A)  is  a 
constraint  on  the  values  of  a  and  a/ A 

a  =  \J (27ra/A)2  -  (4.2) 

The  contours  (4.2)  agree  well  with  the  trenches  evident  in  Figure  2.  With  the  same  intent 
we  can  require  that  the  l- th  zero  of  ^(x)  occur  at  the  radius  of  the  second  closest  dual 
grid  node  ( Ri  =  a/2/A).  Points  satisfying  both  of  these  constraints  can  be  expected  to 
have  a  particularly  small  representation  error.  In  Figure  2  these  favourable  choices  are 
labelled  k:  l. 

The  above  argument  can  be  also  be  applied  the  triangular  grid.  Establishing  the  Pois¬ 
son  summation  formula  for  such  is  straightforward  (either  generalised  from  VII  Section 
2  of  [11]  or  specialised  from  the  formula  for  topological  groups  in  [6]),  and  one  finds  that 
dual  grid  is  the  triangular  grid  with  node  spacing  2/(Ay/3).  The  representation  error  is, 
qualitatively,  similar  to  that  shown  in  Figure  2.  To  make  a  quantitative  comparison  we 
plot,  in  Figure  3,  the  representation  error  on  the  principal  trench  (i.  e.,  along  the  contour 
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FlG.  2.  The  representation  error  of  the  square  planar  grid  for  m  =  2.  The  lower  contour 
map  shows  the  root  mean  square  error  in  representation  for  different  values  of  the  grid 
ratio  a/  A  and  localisation  a.  The  upper  figure  shows  the  error  along  the  trenches  evident 
in  the  lower.  Favourable  choices  of  the  parameters  are  marked  1 : 2,  i :  3,. . . ,  and  are  also 
shown  in  the  lower  figure.  Values  of  the  parameters  to  the  left  of  the  dashed  line  give 
rise  to  a  diagonally  dominant  interpolation  matrix. 
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Fig.  3.  Error  in  the  principal  trench  for  square  (dashed)  and  triangular  (solid)  grids. 

(4.2)  for  k  —  1  in  the  case  of  the  square  grid)  for  each  grid  type  and  a  number  of  values 
of  m. 

To  ensure  a  fair  comparison,  the  horizontal  scale  in  Figure  3  is  adjusted  for  each 
grid-type  to  give  equal  node  densities.  As  is  seen,  the  two  grid- types  have  similar  error 
performance,  suggesting  that  the  square  grid  (with  attendant  ease  of  implementation) 
is  to  be  preferred  in  practice. 

5  The  functions  of  Wendland 

It  is  interesting  to  compare  Lewitt’s  functions  with  the  radial  basis  functions  of  Wendland 
[1,  14],  positive  definite  functions  whose  window  functions  are  piecewise  polynomial.  The 
positive  definiteness  of  Wendland’s  functions  indicate  their  usefulness  in  approximation, 
for  which  extensive  results  exist,  and  a  number  of  recent  papers  have  explored  their  use 
in  the  discretisation  of  partial  differential  equations. 

The  use  of  Wendland’s  functions  in  x-ray  problems  does  not  appear  to  have  been 
investigated,  although  their  Abel  transforms  can  be  obtained  analytically.  We  do  not  ad¬ 
dress  this  question  here,  but  indicate  why  Lewitt’s  functions  may  offer  some  advantages 
for  such  problems.  The  Fourier  transform  of  Wendland’s  function  $2,0 ,  whose  window  is 
02) 0  (r)  =  (1  —  r)+,  is  proportional  to 

2irr 

{2nr  —  t)2t  Jo(t)  dt  =  0(r~3)  (r  =  ||x||) 

(see  Section  3,  [14]).  In  Figure  4,  4>2,o  is  plotted  along  with  the  Fourier  transform  4>2,  of 
Lewitt’s  function  with  a  =  1  and  the  parameter  choice  1 :2  of  Figure  2.  Although  both 
have  the  same  asymptotic  decay  of  the  Fourier  transform,  Lewitt’s  is  more  localised 
about  zero  and  thus  may  offer  better  suppression  of  ghosts  in  x-ray  problems. 

Finally  we  mention  that  Buhmann  has  shown,  in  [1],  that  Wendland’s  window  func- 
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Fig.  4.  Fourier  transforms  of  basis  functions. 


tion  admits  a  convolution  representation  of  the  form 


rOC 

p(r):=  (1  -r2/t)n+tng{t)dt  (5.1) 

Jo 

for  the  weight  g(t)  =  (1  -  £)+  with  suitable  k  and  n.  We  note  that  (5.1)  may  be  solved 
for  g ,  since  substituting  x  =  r2  in  (5.1)  allows  it  to  be  reduced  to  a  standard  integral 
equation  whose  solution, 

9(x)  =  ^~f{n)(x),  f(x)  =  p(r), 


can  be  found  in  Article  1.1-4.32  of  [10].  In  the  case  that  p  is  Lewitt’s  window  one 
may  use  the  differentiation  formula,  All  of  [5],  to  find  the  corresponding  weight  g.  For 
n  =  1  we  find  that 


9{x)  =  - 


1 


'm— 1 


(«) 


2a™-2  Im(a) 


a  weight  qualitatively  different  from  that  of  Wendland’s  function. 
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Abstract 

In  this  paper  we  discuss  a  number  of  recent  developments  in  the  practice  of  how  to 
compute  with  radial  basic  functions.  The  two  main  problems  addressed  are  how  to  develop 
fast  evaluation  schemes  for  radial  basic  functions,  and  how  to  efficiently  carry  out  the 
solution  of  the  interpolation  problem.  The  approach  is  to  mainly  describe  work  which 
has  involved  the  author  and  Professor  Rick  Beatson  as  contributors,  and  to  include  an 
idiosyncratic  selection  of  works  by  other  researchers  which  have  attracted  the  attention 
of  the  author. 

1  Introduction 

Research  into  radial  basic  functions  has  been  active  now  for  about  30  years.  The  basic 
setup  is  as  follows.  A  function  xp  :  IRn  — >  3R,  which  we  refer  to  as  the  basic  function ,  is 
specified.  A  subspace  is  then  constructed  by  reference  to  points  X\, . . . ,  xm  in  Rn.  The 
members  of  this  subspace  all  have  the  form 

m 

s(&)  =  '^2ai'*P(x  ~  Xi),  X  e  Rn, 
i= 1 

where  the  a i, . . . , am  are  real  numbers.  It  is  important  to  appreciate  at  the  outset  that 
throughout  this  paper,  and  indeed  in  most  of  the  papers  appearing  in  this  area,  the  un¬ 
derlying  assumption  is  that  the  points  #i, . . . ,  xm  are  distinct.  One  of  the  most  common 
tasks  for  which  these  functions  are  used  is  interpolation.  A  small  amount  of  research 
has  been  carried  out  where  the  points  at  which  an  interpolant  is  developed  are  arbitrary 
distinct  points  in  Hn,  but  by  far  the  majority  of  the  work  relates  to  interpolation  which 
is  carried  out  at  the  same  points  as  those  used  to  effect  the  translation.  Accordingly, 
data  di, . . . ,  dm  are  given  at  a?i, . . . ,  xm,  and  we  require  that 

m 

dj  =  s(xj)  =  ^2aiip(xj-Xi),  j  =  (1.1) 

1=1 

Two  immediate  observations  present  themselves.  Firstly,  at  the  present  level  of  generality 
there  is  absolutely  no  guarantee  that  the  Equations  (1.1)  will  have  a  unique  solution. 
Secondly,  one  knows  from  the  work  of  Mairhuber  [14]  that  there  are  no  Haar  subspaces 
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of  significant  dimension  in  any  space  IRn  for  n  >  2.  What  this  means  is  that  if  we  are  to 
construct  interpolation  problems  which  have  a  unique  solution  for  each  location  of  the 
data  points  x\ ,...,xm  and  for  each  choice  of  the  data  . . . ,  dm,  then  the  subspace  used 
must  vary  as  the  interpolation  points  vary.  If  we  pause  for  a  moment  and  consider  how  we 
might  in  some  sensible  and  orderly  way  vary  the  subspace  as  the  points  x\ , . . . ,  Xm  vary, 
then  using  simple  shifts  of  a  single  basic  function  ip  is  one  of  the  most  natural  choices. 
It  is  very  common  to  work  with  a  function  'ip  which  is  a  radial  function.  Thus  we  take  a 
function  <j>  :  1R+  — ►  1R  and  determine  ip  by  the  rule  ip(x)  =  <p(\x\)  for  all  x  €  IRn.  .Note 
that  throughout  this  account,  the  symbol  |  •  j  will  stand  for  the  Euclidean  norm  in  3Rn. 
At  this  point  a  common  inaccuracy  arises.  The  function  \p  can  be  correctly  referred  to  as 
a  radial  basic  function.  However,  many  authors  give  this  appellation  to  the  function  0, 
whose  radiality  is  of  no  consequence  whatsoever,  since  it  would  imply  that  <p  was  simply 
an  even  function  on  R.  Since  <p  only  acts  on  IR+  the  idea  that  (p  can  be  radial  is  vacuous. 
Let  us  continue  in  this  spirit  of  criticism  a  little  while  longer.  As  far  as  the  author  is 
aware,  only  two  people  in  the  world  would  refer  to  ip  as  a  basic  function,  or  a  radial  basic 
function.  All  other  authors  would  use  the  word  basis  in  place  of  basic.  There  are  very 
obvious  problems  with  this  terminology.  We  are  seeking  to  generate  subspaces  which  are 
suitable  for  interpolation.  Such  subspaces  will  naturally  have  the  same  dimension  as  the 
number  of  data,  and  the  functions  {ip(-  —  Xi)  :  i  =  l,...,m}  should  form  a  basis  for 
the  subspace.  The  use  of  the  word  basis  in  two  completely  different  senses  seems  to  the 
author  to  be  misleading  and  unhelpful,  whereas  use  of  the  word  basic  —  a  difference  of 
one  character  —  eliminates  any  possibility  of  confusion,  and  avoids  the  use  of  the  word 
basis ,  which  has  a  very  specific  mathematical  meaning,  in  a  context  where  its  meaning 
is  not  the  usual  mathematical  one. 

The  problem  about  whether  interpolation  is  possible  has  a  highly  satisfactory  answer 
in  the  work  of  Micchelli  [15].  We  direct  the  reader  to  the  book  of  Cheney  and  Light  [10] 
for  a  full  account  of  these  matters.  A  couple  of  examples  will  be  helpful.  If  one  chooses 

m  m 

s(x)  =  YaMx  ~  gj|)  =  y^,ai  exp(-|s  ~  s*|2)>  zeIRn, 

i= 1  i—  1 

or 

771  771 

s(x)  =  ^2ai<p(\x-Xi\)  =  Y^ai\x-Xil  x  £  3Rn, 

i=  1  i=  1 

then  the  resulting  interpolation  problem  is  uniquely  solvable  for  any  choice  of  xi,.. . ,  xm 
and  for  any  data  d\, . . .  ,dm.  This  result  contrasts  very  strongly  with  the  case  for  poly¬ 
nomial  interpolation,  where  the  data  points  x\, . ... , xm  have  to  be  constrained  not  to  lie 
on  an  algebraic  surface  of  appropriate  degree.  Indeed,  the  alternative  formulation  of  the 
above  result  for  the  second  example  is  quite  often  surprising  to  mathematicians  who  are 
uninitiated  in  the  theory  of  radial  basic  functions. 

Theorem  1.1  Let  x\, . . .  ,  xm  be  distinct  points  in  Mn.  Then  the  matrix  (\xj  -  Xi\)  is 
invertible. 

Having  drawn  a  clear  distinction  between  polynomial  approximation  and  approxim¬ 
ation  by  (radial)  basic  functions  it  is  at  this  point  that  we  must  consider  having  some 


polynomial  ingredients  in  our  interpolant.  This  is  done  in  a  very  standard  way  by  a 
process  we  call  augmentation  by  polynomials.  We  consider  interpolants  of  the  form 

m 

1  s(x) -^2ai<j)(\x  ~  Xi\) +p(x),  (zelR71). 

. .  ■  i= 1 

Here  p  is  a  polynomial  of  total  degree  at  most  A:  —  1.  We  still  wish  to  interpolate  to 
m  pieces  of  information,  but  now  have  more  than  m  parameters  to  determine  with 
this  information.  The  remaining  parameters  are  determined  via  the  ‘natural’  boundary 
conditions.  The  full  set  of  equations  is 

m 

dj  =  s(xj)  =  ^2ai<l>(\xj-Xi\),  j  =  \,..  ,m 
m 

0  =  y ^aiqjxi),  for  all  q  G  7rfc_i(Hn). 

*= i 

Here  7Tfc_i(lRn)  represents  the  space  of  polynomials  of  total  degree  k  —  1  in  ]Rn.  Two 
questions  present  themselves  pretty  quickly  from  this  additional  hypothesis.  Why  should 
polynomials  be  added  to  the  interpolant,  and  why  are  the  boundary  conditions  chosen  in 
this  particular  way?  In  some  sense  it  is  essential  that  we  allow  ourselves  the  possibility 
of  adding  polynomial  terms  to  some  of  the  interpolants,  as  we  shall  soon  see.  The  most 
important  example  of  a  radial  basic  function  interpolant  which  has  a  polynomial  part 
will  be  the  thin-plate  spline.  We  will  make  considerable  reference  to  this  interpolant  in 
JR2,  where  it  has  the  form 

m 

s(a:)  =  ^  In  |rr  -  -I- arc -f  6,  (x  6  E2). 

*=i 

Note  here  that  the  parameter  a  is  a  vector  with  two  entries,  as  is  x.  Thus  ax  stands 
for  the  dot  product  between  a  and  x.  The  parameter  b  is  a  real  number.  The  natural 
boundary  conditions  take  the  form 

m  m  m 

^  ^  Clj  =  a%Si  —  ^  ^  Q>iti  —  0, 

«=1  i— 1  i—  1 

where  X{  —  i  —  1, . .  •  ,m.  This  particular  interpolant  exhibits  a  feature  common 

to  all  the  cases  where  augmentation  by  polynomials  is  either  necessary  or  desirable: 
the  degree  of  the  polynomial  added  is  very  low.  The  usual  choices  are  k  —  0  (when 
no  polynomial  term  is  added),  k  =  1  (when  the  term  is  a  constant  polynomial)  and 
k  =  2  (when  the  added  polynomial  is  linear).  It  is  now  no  longer  possible  to  carry 
out  interpolation  for  all  choices  of  the  points  X\, . . .  ,xm.  One  must  avoid  distributions 
of  these  points  which  lie  on  a  zero  surface  of  the  corresponding  polynomial  subspace. 
In  the  explicit  case  we  considered  above  (thin-plate  splines),  the  very  mild  restriction 
needed  is  that  £l5  •  •  • » should  not  all  lie  on  a  single  straight  line.  The  theory  developed 
by  Micchelli  [15]  includes  the  case  of  augmentation  by  polynomials. 

We  now  propose  to  take  a  look  at  a  very  simple  example  which  we  hope  will  give  the 
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reader  a  feel  for  some  of  the  ideas  and  concepts  we  have  introduced  so  far.  We  consider 

m 

s(x)  =  ^2ai\x  ~  xi\  +  &  (ar  €  H). 

i= 1 

Here  the  parameter  &  is  a  real  number,  and  the  natural  boundary  condition  gives  us 
ai  ~  0.  A  unique  feature  of  the  univariate  case  is  that  we  can  order  the  interpolation 
points  xi  <  X2  <  *  *  •  <  xm.  Now  consider  the  function  s  in  one  of  the  intervals  [&<,£<+ 1], 
i  =  1, . . . ,  m  —  1.  It  is  clear  that  in  such  an  interval  s  is  simply  a  linear  function.  The 
demand  that  s  interpolates  the  data  at  X\ , . . . ,  zm  means  that  s  must  be  the  piecewise 
linear  interpolant  to  the  data  in  the  interval  [a?i,  xm].  What  is  the  effect  of  the  ‘natural’ 
boundary  conditions?  In  the  interval  [a;m,oo)  we  can  write 

m  m 

s(x)  —  aj(x  -  Xi)  +  b  =  -  ajXj  +  b. 

i= 1  t=l 

Thus  s  is  constant  in  [xm,  oo).  A  similar  calculation  reveals  that  s  is  constant  in  (— oo,  x\]. 
Combining  all  these  observations  shows  that  s  is  the  natural  linear  spline  interpolant 
to  the  data  at  x\ , . . .  ,a?m.  This  goes  some  way  to  explaining  why  the  word  ‘natural’  is 
appended  the  boundary  or  extra  conditions.  But  we  can  go  a  little  further.  It  is  well 
known  that  the  natural  splines  satisfy  a  variational  principle.  For  the  linear  spline,  if  we 
examine 

x  =  {feS'-.f’eL2(i R)}, 


/OO  rOO 

(s')2  <  /  (/'): 

-oo  J  — oo 


for  all  f  £  X  which  also  interpolate  the  data.  This  variational  principle  is  very  useful  in 
developing  error  estimates,  and  we  shall  return  to  this  general  thread  of  ideas  later  in  this 
account.  However,  we  ought  to  observe  that  S'  is  the  space  of  tempered  distributions, 
and  that  the  first  derivative  is  to  be  taken  in  the  distributional  sense.  There  are  ways 
of  getting  round  this  distributional  approach  (see  Cheney  and  Light  [10]  for  an  example 
which  corresponds  closely  to  the  discussion  here) ,  but  it  does  give  the  most  succinct  de¬ 
scription,  and  creates  the  technical  background  which  will  underpin  all  the  theory  which 
has  been  developed  in  this  area.  Notice  also  that  the  quantity  being  minimised  can  be 
used  to  specify  a  seminorm  on  X  simply  by  taking  the  square  root  of  the  integral.  This 
seminorm  has  as  kernel  7r0(IR),  which  is  precisely  the  polynomial  subspace  we  use  to  aug¬ 
ment  the  original  radial  basic  function.  Something  very  fundamental  is  happening  here. 
Most  mathematicians  would  regard  this  seminorm  as  being  a  measure  of  smoothness 
of  the  corresponding  function.  The  natural  linear  spline  therefore  interpolates  the  data, 
and  is  the  smoothest  interpolant  to  the  data  from  X  in  the  sense  that  it  possesses  the 
smallest  derivative  in  the  L2-norm.  If  we  are  to  pursue  this  very  natural  idea  of  making 
higher  derivatives  of  s  small,  then  we  will  naturally  develop  seminorms  with  polynomial 
kernels.  This  goes  a  long  way  towards  explaining  the  need  for  augmentation. 

Finally  in  this  introduction,  we  want  to  discuss  briefly  the  uses  to  which  radial  basic 
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function  interpolation  is  put.  There  are  two  significant  feelings  about  interpolation  by 
these  functions.  Firstly,  it  is  thought  that  radial  basic  function  interpolation  is  very 
good  for  treating  scattered  data.  Loosely  speaking,  data  is  scattered  when  there  is  no 
possibility  of  determining  either  a  natural  choice  of  coordinate  axes,  or  an  origin.  It 
is  at  the  opposite  end  of  the  spectrum  to  gridded  data.  In  the  presence  of  a  cartesian 
product  for  the  data  sites,  it  is  much  more  efficient  to  use  univariate  methods  together 
with  tensor  product  constructions  to  do  the  interpolation.  Secondly,  radial  basic  function 
interpolation  is  thought  to  be  very  good  for  dealing  with  high  dimensional  data.  There  is 
some  evidence  from  the  realm  of  neural  networks  that  this  is  indeed  the  case,  but  we  will 
not  venture  into  the  area  of  high  dimensional  data  interpolation  in  this  paper.  Finally, 
many  of  the  data  sets  we  want  to  treat  have  very  large  numbers  of  data  sites  and  so  our 
aim  is  to  develop  methods  which  will  handle  10,000  to  1,000,000  data  sites  or  more. 

2  Computational  difficulties  and  fast  evaluation 

In  this  Section,  we  want  to  discuss  the  difficulties  that  arise  when  a  large  radial  basic 
function  interpolation  problem  is  posed.  We  shall  also  deal  with  one  of  the  essential  tools 
for  overcoming  some  of  the  difficulties.  The  system  we  want  to  solve  has  the  form 

m 

dj  =  s(xj)  =  '^T,a.i<p(\xj  -Zil)  +p{xj)  (j  =  1,  . . .  ,m)  (2.1) 

1=1 
m 

0  =  ]Pa7;g(xi),  for  all  q  E  7r^_i(3Rn).  (2.2) 

i—  1 

If  we  declare  a  basis  for  7rfc_x(3Rn)  then  we  can  write  these  equations  in  matrix  form  as 

(*?)(:)-(!)■ 

Here  the  matrix  A  has  entries  <p{\ Xj  —  «*|)  and  is  m  x  m.  The  matrix  Q  has  entries 
Pe(xj),  where  pi,.. .  ,pu  is  a  basis  for  7Tfc_i(IRn),  and  is  of  size  m  x  v.  Recall  from  our 
assumptions  that  only  low  degree  polynomials  are  used,  and  so  Q  is  a  long  thin  matrix. 
In  the  case  of  thin-plate  splines  in  IR2  it  would  have  size  m  x  3.  However,  A  is  a  very 
large  matrix,  with  absolutely  no  sparsity.  In  fact,  for  thin-plate  splines,  the  matrix  A 
is  zero  on  the  diagonal,  and  has  large  off-diagonal  entries.  In  solving  a  large  system  of 
linear  equations,  the  only  effective  strategy  is  to  use  an  iterative  solver.  Such  a  solver 
will  involve  many  multiplications  of  the  matrix  A  with  a  vector  a,  and  the  full  nature 
of  A  makes  this  a  very  costly  process.  One  of  the  key  discoveries  in  this  area  was  the 
Beatson  and  Newsam  [8]  result  which  showed  how  fast  multipole  algorithms  could  be 
applied  to  this  area.  If  we  consider  the  expression 

Ttl 

s(xj)  =  y^qj|xj  -  Xj\2  lnflffj  -  Xj\ )  +p(xj) 
t  i=l 

for  some  Xj  €  !Rn,  then  this  can  be  considered  as  an  evaluation  of  the  function  s  at  the 
point  Xj ,  or  the  formation  of  an  element  in  the  matrix  vector  product  Aa.  Because  of 
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this,  most  authors  tend  only  to  consider  how  to  evaluate  the  function  s  in  an  efficient  way 
—  generating  what  are  known  as  fast  evaluation  algorithms.  It  is  impossible  to  estimate 
properly  the  importance  of  this  discovery.  Anyone  involved  in  programming  iterative 
solutions  to  the  thin-plate- spline  equations  with  tens  of  thousands  of  points  would  find 
that  any  such  algorithm  would  just  grind  itself  into  the  dust  without  this  technology.  The 
technology  really  has  two  aspects:  a  mathematical  tool,  and  a  programming  structure. 
Here  we  intend  to  give  only  the  flavour  of  the  argument.  The  reader  who  really  wants  to 
know  the  details  is  advised  to  look  either  at  the  original  paper  [8] ,  or  the  later  paper  of 
Beatson  and  Light  [5]  which  deals  with  polyharmonic  splines.  She  can  also  look  at  two 
papers  which  give  clear  explanations  of  simple  cases.  The  first  is  found  in  a  survey  paper 
by  Beatson  and  Greengard  [3].  The  second  is  a  technical  report  by  Beatson,  Levesley 
and  Light  [7].  This  last  paper  discusses  fast  evaluation  methods  on  the  circle  and  higher 
dimensional  spheres,  and  the  reader  will  find  a  very  careful  and  full  account  of  the 
one-dimensional  circle  case.  The  first  trick  with  problems  iri  1R2  is  to  consider  complex 
variables,  rather  than  points  in  1R2.  Let  z  be  a  point  at  which  we  wish  to  evaluate  s,  and 
u  a  data  point,  or  centre.  Then 

\z  —  u\2  In  \z  —  u\  —  7l£ (\z  —  u\2  ln(z  —  u))  =  K£(\z  —  u\2  In  z) +  /R£^\z  —  u\2\n  ^1  —  — 

Look  at  the  last  two  expressions  here.  The  first  of  them  has  the  centre  u  in  the  square  of 
the  modulus  term,  and  this  expression  is  quite  cheap  to  evaluate,  even  if  there  are  many 
centres  u.  The  effect  of  many  centres  on  the  second  term  is  however  quite  profound,  and 
it  is  with  this  term  that  we  must  work.  The  idea  is  to  set  a  tolerance,  and  only  aim  to 
evaluate  s  to  within  this  tolerance,  rather  than  exactly.  The  appropriate  series  expansion 
can  then  be  used: 


„P,„  i - H  - £«,  f )’ »£«,(!!)’  =  £/, M. 


The  value  of  N  depends  on  the  tolerance  demanded  of  the  evaluation  and  the  relative 
sizes  of  u  and  z.  For  this  reason,  we  think  of  z  as  far  away  from  the  origin  in  1R2,  and 
u  close  to  the  origin.  If  there  are  now  many  centres  ui , . . . ,  um  near  the  origin,  and  z  is 
far  away  from  the  origin,  then  we  can  summarise  the  effects  of  linear  combinations  of 
all  these  centres  as  follows: 


'M'-t) 


j  ai  >  fp{ui); 

i—1  p=  1 


-  EE  aifp{ui)z  p  =  ^2gp(ui,...,um)z  p. 

p= 1 i=l  p=l 

The  principle  now  is  to  use  the  last  expression  above  to  make  an  approximate  evaluation 
of  s.  Of  course,  the  assumption  that  z  was  far  from  the  origin  and  were 

close  to  the  origin  is  not  important.  It  is  simply  important  that  2  be  far  away  from  the 
cluster  of  centres  : . . ,  um.  The  summarising  expression  is  referred  to  as  a  Laurent  type 
expansion,  because  it  summarises  the  contribution  of  the  centres  «i, . . .  ,um  in  terms  of 
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Fig.  1.  Fast  evaluation  panelling. 


series  involving  negative  powers  of  z.  There  is  now  a  lot  of  preprocessing  to  go  on  before 
the  fast  evaluation  algorithm  is  ready  to  roll.  Figure  1  shows  how  the  algorithm  proceeds. 
The  shaded  square  at  the  bottom  left  of  the  domain  is  the  point  which  contains  z,  the 
evaluation  point.  All  the  squares  around  this  one  which  are  the  same  size  are  deemed 
to  be  ‘close’  to  the  evaluation  square.  All  other  squares  are  ‘far  away’.  Of  course,  as  the 
squares  get  further  away  from  z  it  becomes  possible  to  use  our  summarising  technique 
to  total  up  the  contributions  of  larger  and  larger  numbers  of  points.  This  is  done  in  a 
very  explicit  manner,  which  is  represented  by  the  shading  in  Figure  1.  As  we  get  further 
away,  we  double  the  size  of  the  squares  over  which  we  summarise,  and  there  is  a  band  of 
same-size  squares  (or  a  ring,  if  the  evaluation  square  was  in  the  middle  of  the  domain) 
two  squares  wide  surrounding  the  evaluation  square.  Once  all  the  preprocessing  is  done, 
and  we  shall  discuss  this  a  little  more  in  a  moment,  all  the  needed  coefficients  gp  are 
available,  and  evaluation  can  be  carried  out  in  about  C?(logm)  FLOPS  instead  of  0(m). 

The  above  account  does  not  quite  reveal  the  whole  story.  The  coefficients  gp  are 
calculated  in  an  orderly  manner  which  greatly  improves  the  efficiency  of  the  algorithm. 
Suppose  our  problem  is  located  in  [0,  l]2.  An  initial  decision  is  made  to  divide  the 
original  domain  into  squares  of  size  2-n.  There  is  then  a  parent  child  relationship  derived 
through  a  quad-tree  data  structure.  The  parent  [0,  l]2  has  four  children:  [0, 0.5]2,  [0.5,  l]2, 
[0,0.5]  x  [0.5,1]  and  [0.5,1]  x  [0,0.5].  This  parent-child  relationship  helps  in  setting  up 
the  coefficients  gp(u\,. . .  ,um)  in  an  efficient  way.  There  is  also  a  further  idea  involving 
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Taylor  series,  which  gives  more  efficiency.  We  omit  any  description  of  this  technique. 

3  Inverting  the  interpolation  matrix 

Recall  as  at  the  beginning  of  the  previous  section  that  the  equations  specifiying  the 
interpolation  problem  are  as  follows: 

m 

s{xj)  =  J2ai<t>(\xi-Xi \)+p(xj),  {j  =  1, ...  ,m)  (3.1) 

i=l 

m 

^2aMxi)i  for  all  g  €  7rfc_i(3Rn).  (3.2) 

i= 1 

In  matrix  terms  we  have 


where  A  is  a  full  matrix  which  tends  to  exhibit  poor  conditioning.  The  poor  conditioning 
of  A  is  similar  to  problems  experienced  by  researchers  in  the  theory  of  finite  elements  — 
as  the  interpolation  points  become  very  dense  in  a  given  region,  the  conditioning  gets 
worse.  In  fact,  there  are  formal  statements  relating  some  impression  of  the  condition 
number  of  A  (usually  the  smallest  eigenvalue  of  A)  to  the  minimum  interpoint  distance. 
The  following  table  represents  the  condition  number  of  A  when  the  interpolation  points 
are  given  on  a  uniform  5x5  grid  in  [0,  a]2.  Of  course,  on  a  philosophical  level,  it  does  not 


Scale  parameter 

Condition 

a 

Number 

1.0 

3.6458  x  102 

0.1 

2.5179  x  104 

0.01 

2.4364  x  106 

0.001 

2.4349  x  108 

Tab.  1  Two  norm  condition  numbers  of  A. 

make  any  sense  whatsoever  to  describe  an  interpolation  problem  as  being  ill-conditioned. 
Let’s  discuss  this  point  in  a  little  more  depth.  Suppose  sq, . . . ,  xm  are  points  in  Hn,  and 
G i, . . . ,  Gm  are  a  set  of  functions  from  1R71  to  1R  which  are  linearly  independent  over 
{a?i, . . . ,  Xm}.  That  is,  interpolation  to  arbitrary  data  at  asi, . . . ,  xm  by  linear  combina¬ 
tions  of  G?i, ,  Gm  is  always  uniquely  possible.  Then  there  is  a  basis  Fx, . . . ,  Fm  for  the 
linear  span  of  <2X, . . . ,  Gm  such  that  Fi(xj)  is  1  if  i  =  j  and  is  zero  for  all  other  values  of 
i,  j  between  1  and  m.  If  the  given  data  is  dx, . . . ,  dm,  then  the  interpolant  can  be  written 
down  immediately  as 

m 

Y^diFi(x)  (x  €  ]Rn). 

1=1 


dj  — 
0  = 
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If  one  has  in  one’s  hands  the  basis  {Fx,...,Fm}  and  wants  to  know  the  coefficients 
which  must  be  used  then  one  need  only  invert  the  identity  matrix  to  obtain  the  solution, 
and  there  are  not  many  matrices  which  are  better  conditioned  than  the  identity  matrix! 
Of  course,  getting  one’s  hands  on  the  basis  Fi,. . . , Fm  is  usually  rather  difficult  —  as 
hard  as  solving  the  original  problem  in  fact.  It  has  become  traditional  to  refer  to  the 
basis  F\ , . . . ,  Fm  as  the  Lagrange  basis  (in  sympathy  with  the  fact  that  Lagrange  was  a 
person  who  wrote  down  this  basis  for  polynomial  interpolation  in  one  dimension)  or  the 
cardinal  basis.  This  last  term  seems  to  the  author  to  be  quite  appropriate,  indicating 
that  the  basis  is  special.  However,  it  does  not  find  favour  with  spline  theorists,  since 
they  think  of  the  word  cardinal  in  a  very  technical  sense  (the  interpolation  points  are 
7L  ).  Terminology  aside,  the  point  is  still  made  that  the  conditioning  of  any  interpolation 
problem  is  a  function  of  the  available  basis.  A  more  practical  case  of  this  phenomenon 
is  the  problem  of  natural  cubic  splines  in  H.  They  fit  into  the  radial  basic  function 
interpolation  scenario,  because  a  natural  cubic  spline  with  knots  at  Xi,...,xm  can  be 
written  as 

m 

s(x )  =  y^Qj|a:  —  Xj\3  +  ax  +  b  (x  €  1R). 
i=i 

If  we  require  this  spline  to  interpolate  data  di, . . . , dm  at  x\ ,xm  then  we  have  to 
require  that  s(xj)  =  dj  for  j  =  1, . . .  ,ra.  The  natural  property  comes,  as  expected,  from 
the  natural  boundary  conditions: 

i  m  m 

y:  cii  —  J]  ajXj  =  0. 

j=l  i— 1 

The  ill-conditioning  illustrated  in  Table  1  would  be  equally  present  in  this  example,  and 
the  remark  that  the  conditioning  increases  as  the  interpoint  spacing  decreases  would  also 
hold  good.  Of  course,  to  suggest  the  use  of  this  basis  to  a  spline  practitioner  would  not 
be  a  good  idea!  We  are  well  used  to  the  idea  that  B-splines  are  the  correct  basis  to  use 
in  this  situation. 

I  suppose  the  two  principles  to  emerge  from  the  above  discussion  are  that  the  basis 
we  have  used  thus  far  to  describe  the  interpolation  problem  is  not  satisfactory  from  a 
computational  point  of  view,  and  that  in  at  least  some  of  the  cases  under  discussion 
(all  of  them  one-dimensional)  there  are  other  bases  which  are  superior.  There  are  other 
ways  to  conceptualise  the  difficulties  we  experience  with  the  radial  basic  functions.  Most 
of  them  tend  to  grow  at  infinity,  and  have  small  value  at  zero.  As  a  general  principle, 
we  would  like  a  basis  to  mimic  the  B-spline  basis.  That  is,  we  would  like  the  basis  to 
be  local  if  possible  —  each  basis  function  having  a  fairly  small  support  around  one  of 
the  interpolation  points.  The  first  people  to  make  progress  in  this  area  were  Dyn  and 
Levin  [11]  in  1983.  There  is  a  later  paper  with  Rippa  [12]  in  1986  which  is  also  worth 
looking  at.  Their  technique  was  based  on  the  observation  that  if  F(x)  =  \x\2  In  |ar|,  and 
x  £  IR2,  then  V4F  =  87r<5.  Here,  V4  represents  the  bilaplacian,  and  5  is  the  Dirac  delta 
distribution  whose  action  on  each  rapidly  decreasing  function  in  S  is  to  evaluate  it  at 
zero.  This  description  alone  should  alert  us  to  the  fact  that  V4F  =  87 tS  is  a  distributional 
equation,  and  as  such  must  be  handled  with  care.  However,  numerical  analysts  dash  in 
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where  others  fear  to  tread,  and  we  can  approximate  the  Laplacian  as  follows: 

(V2F)(x)  «  h-2{F(x-he1)+F(x+hei)+F(x-he2)+F{x+he2)-4F(x)}  (x  6  1R2). 

Here  ft  is  a  real  parameter,  and  e\  and  e2  are  the  usual  unit  vectors  in  1R2.  Pictorially,  we 
can  represent  this  approximation  by  the  stencil  shown  in  Figure  2.  The  bilaplacian  stencil 

In 


1  4  1 


1 


Fig.  2.  The  stencil  for  the  Laplacian. 

is  shown  in  Figure  3.  This  observation  is  used  in  a  straightforward  way  if  the  interpolation 
points  lie  on  a  grid.  Instead  of  using  the  thin-plate  spline  radial  basic  function  to  generate 
a  basis,  one  uses  the  appropriate  linear  combinations  which  represent  the  bilaplacian  of 
this  function.  Because  one  has  a  distributional  equation  relating  this  quantity  to  the  8 
function,  one  does  not  expect  to  get  the  8  function  exactly,  but  one  certainly  does  expect 
to  get  a  function  which  decays  rapidly  at  oo,  and  this  is  exactly  what  happens.  Dyn  and 
Levin  provide  some  encouraging  numerical  results.  Of  course,  there  remains  the  problem 
of  what  to  do  when  the  data  is  not  gridded.  Here  one  must  develop  first  the  appropriate 
stencil  for  the  Laplacian  on  a  point  by  point  basis.  This  may  seem  laborious,  but  in  fact 
the  next  few  methods  we  will  describe  all  compute  better  basis  elements  on  a  point  by 
point  basis. 

Perhaps  the  most  successful  class  of  schemes  of  this  nature  —  computing  a  new 
basis  on  a  point  by  point  approach  —  comes  from  Beatson,  Goodsell  and  Powell  [2]  and 
Beatson,  Cherie  and  Mouat  [1].  Their  approach  is  perhaps  simpler  to  appreciate  and 
implement  than  that  of  Dyn  and  Levin.  They  begin  with  the  observations  I  made  earlier 
—  what  we  are  really  after  is  the  cardinal  basis  Fi, . . . ,  Fm  with  the  property  that  Fi(xj) 
is  1  if  i  —  j  and  is  zero  for  all  other  values  of  2,  j  between  1  and  m.  However,  because 
this  problem  is  as  difficult  to  solve  as  the  original  one,  we  proceed  as  follows.  Consider 
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Fig.  3.  The  stencil  for  the  bilaplacian. 


the  job  of  trying  to  construct  F{.  This  function  is  supposed  to  be  1  at  X{  and  zero  at  all 
other  points.  Choose  about  50  near  neighbours  of  x say  yj  €  {zi,  • .  *  ,  zm}.  This  choice 
must  include  Xi .  Then  take 

50 

Fi(x)  =  ^  dj  \x  -  yj\2  In  \x  —  yj |  -f  bx  -f  c,  (z  G  IR2). 
j=i 


We  demand  that 


1  if  yj  =  Xi , 
0  otherwise, 


and  that  the  natural  boundary  conditions  are  also  satisfied.  Thus  we  are  producing 
approximate  cardinal  functions  which  have  the  value  1  at  the  required  point,  but  are  only 
zero  on  about  50  neighbouring  points.  This  suggestion  is  based  on  the  fact,  observed  by 
many  workers,  that  such  functions  are  often  small  elsewhere  in  the  domain.  We  produce 
some  pictures  to  illustrate  this.  In  the  first  (Figure  4),  289  points  are  spaced  on  a  regular 
grid  in  [0,  l]2.  The  approximate  cardinal  function  is  based  on  the  13  points  shown  in  bold 
in  Figure  4.  Figure  5  illustrates  the  same  situation,  but  now  as  shown  the  points  used  to 
develop  the  cardinal  function  are  all  clustered  in  one  corner  of  the  domain.  The  effect  is 
to  produce  significant  values  at  the  opposite  comer  of  the  domain.  One  can  infer  from 
this  that  whenever  the  data  is  pretty  much  uniformly  distributed,  the  cardinal  functions 
using  points  well  inside  the  domain  will  have  good  properties,  while  those  at  the  edge 
will  be  poor.  Similarly,  in  a  non-uniform  distribution,  those  interior  to  a  cloud  of  points 
will  behave  well,  while  those  at  cloud  boundaries  might  not. 

There  are  two  methods  for  dealing  with  the  difficulties  which  have  shown  up  above. 
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Fig.  4.  Approximate  cardinal  function  with  points  central  to  the  domain. 


Fig.  5.  Approximate  cardinal  function  with  points  at  one  corner  of  the  domain. 


Firstly,  one  can  pin  all  the  cardinal  functions  at  a  fixed  set  of  judiciously  chosen  points 
—  so  that  every  cardinal  function  must  have  the  value  zero  at  these  points.  This  is  very 
effective  in  the  case  of  regularly  spaced  data  as  Figures  6  and  7  show.  One  can  imagine 
however,  that  a  data  set  with  a  number  of  clouds  might  benefit  from  a  judicious  choice 
of  points  at  which  to  carry  out  the  pinning.  What  one  would  really  like  is  a  method 
which  does  not  rely  on  any  user  intelligence  in  the  choice  of  points.  As  mentioned  before, 
a  desirable  feature  of  a  good  basis  function  is  one  which  decays  at  infinity.  This  decay 
should  be  at  some  rate  if  possible.  The  Beatson,  Cherie  and  Mouat  prescription  for  thin- 
plate  splines  in  R2  is  that  the  elements  should  decay  like  \x\~ 3  as  |x|  — ►  oo.  There  is  a 
problem  here,  in  that  if  we  opt  for  decay  elements  everywhere,  then  we  will  not  obtain 
a  basis  for  our  space.  To  get  around  this  problem,  we  accept  an  element  Fi  as  a  decay 
element  if  it  satisfies 

50 

y!  myj) — 5ij\  <  p 
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Fig.  6.  Approximate  pinned  cardinal  function  with  points  central  to  the  domain. 


Fig.  7.  Approximate  pinned  cardinal  function  with  points  at  one  corner  of  the  domain, 
and 

\Fi(x)\  =  G(\x\~3)  as  \x\  — >  oo. 

Otherwise,  we  use  the  Fi  which  is  defined  by  the  previous  conditions  of  cardinality.  Again 
there  are  a  few  bells  and  whistles  needed  to  make  this  method  operate  efficiently,  but 
we  hope  that  sufficient  detail  is  present  for  the  reader  to  be  able  to  see  the  general  idea. 
All  the  above  methods  are  providing  ways  of  constructing  a  better  conditioned  basis 
with  which  to  solve  the  problem.  A  method  still  has  to  be  selected  to  invert  the  matrix 
associated  with  the  new  basis,  which  is  now  much  better  conditioned  than  the  original 
matrix  corresponding  to  the  conventional  basis.  The  method  of  choice  for  most  authors 
is  some  version  of  GMRES. 

Beatson  called  the  points  at  which  decay  could  be  obtained  ‘good’  points,  and  points 
at  which  decay  could  not  be  obtained  ‘bad’  points.  This  idea  has  been  built  on  in  a 
recent  technical  report  by  Beatson  and  Levesley  [4].  The  general  spirit  is  to  define  good 
and  bad  points  in  the  same  way  as  Beatson,  and  then  to  develop  an  iterative  solver, 
solving  first  on  the  good,  then  the  bad,  then  returning  to  the  good  and  so  on. 
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Finally,  a  very  successful  method  has  recently  emerged  from  the  researches  of  Beatson, 
Light  and  Billings  [6].  This  method  has  the  advantage  that  it  is  a  fast  iterative  solver 
which  may  be  regarded  as  a  preconditioner  in  its  own  right  (thus  it  may  be  combined 
with  a  solver  such  as  GMRES).  We  will  describe  it  here  as  a  solver.  It  is  essentially 
the  domain  decomposition  method,  although  as  with  previous  solvers,  our  description 
will  be  very  much  at  a  ‘bare  bones’  level,  and  the  interested  reader  is  referred  to  [6]  for 
the  fine  details,  which  include  some  error  estimates,  some  interesting  comments  on  an 
alternative  basis,  and  a  good  deal  of  theory.  We  shall  describe  the  method  as  applied  to 
data  on  the  unit  square  [0,  l]2  in  3R2,  and  we  will  not  make  any  attempt  to  make  the 
method  adaptive  in  character.  The  reader  will  be  able  to  see  these  improvements  for 
herself.  We  will  test  our  method  on  randomly  chosen  data  in  [0,  l]2. 

We  begin  with  a  set  of  nodes  X  =  {xi,...,£m}  at  which  interpolation  is  to  be 
carried  out.  We  will  describe  the  algorithm  as  it  is  implemented  for  solving  the  thin- 
plate  spline  interpolation  problem  on  the  node  set  X.  We  divide  up  the  square  [0,1] 2 
into  a  fairly  large  number  of  sub-domains  Xi, . . . . ,  Xt.  There  are  two  constraints  on  these 
subdomains.  It  is  important  that  they  are  constructed  so  that  about  equal  numbers  of 
points  lie  in  each  subdomain  —  about  50  points  per  subdomain  is  ideal.  Secondly,  it  is 
essential  that  each  subdomain  overlap  all  surrounding  subdomains.  In  our  terminology, 
two  subdomains  overlap  if  they  have  a  (small)  number  of  points  in  common.  In  each 
subdomain  there  are  some  points  in  X  which  lie  only  in  that  subdomain  and  not  in  any 
other.  We  call  these  points  the  inner  points  of  the  subdomain.  A  coarse  set  Y  of  inner 
points  in  the  node  set  X  is  also  chosen.  We  will  say  more  about  this  coarse  set  in  a 
moment,  but  at  this  stage  it  simply  consists  of  a  small  number  of  inner  points  from  each 
subdomain.  The  algorithm  will  then  construct  the  interpolant  s  and  proceeds  as  follows. 
We  initialise  the  interpolant  s  as  s  =  0.  We  want  to  solve  the  equations 

m 

dj  =  s(xj)  -  Oi\xj  -  Xi\2  In  I Xj  -Xi\  +  aXj  +  j3,  (j  =  1, . . . ,  m)  (3.3) 

i=l 

subject  to  the  boundary  conditions 

m  m  m 

J2ai  =  ^aiSi=52aiti=0’  (3-4) 

i=  1  i=  1  i= 1 

where  Xi  '=  (s*,^).  In  matrix  form  these  equations  are 


as  we  have  already  seen.  Our  method  will  operate  by  residual  correction,  so  we  begin  by 
setting 


It  is  important  to  recall  that  a  is  a  vector  of  length  2,  which  we  write  as  a  =  (aq,a 2). 
Suppose  now  we  have  begun  our  iterative  procedure  and  generated  an  approximation  s 
with  a  residual  r.  The  next  few  steps  describe  how  to  update  the  approximation  and  the 
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residual. 

Step  1.  We  construct  se  such  that  each  Sj  is  an  interpolant  based  only  on  all 

points  of  the  subdomain  Xj,  using  as  data  the  residual  vector  r  restricted  to  Xj. 

Step  2.  For  each  inner  point  x  we  now  have  a  single  real  number  ax  which  is  the 
coefficient  of  |  *  - x\ 2  In  |  ■  — rc|.  If  we  look  at  the  collection  of  coefficients  belonging  to  all 
the  inner  points  of  all  domains,  then  this  collection  is  not  in  general  orthogonal  to  tti. 
That  is,  they  fail  to  satisfy  boundary  conditions  of  the  type  given  in  Equation  (3.4).  We 
now  correct  so  that  the  collection  of  coefficients  corresponding  to  all  inner  points  of  all 
domains  is  orthogonal  to  7ri . 

Step  3.  We  set 

S\  =  ^{ax|  *  -#|2  I  •  :  x  is  an  inner  point}.  (3.5) 

Step  4.  We  evaluate  the  residual  1Z  =  r—Si  at  the  coarse  grid  points,  and  then  construct 
the  interpolant  <S2  to  this  residual  on  the  coarse  grid  points  Y. 

Step  5.  We  update  s  by  s  *-  s  -f  Si  -f  <S2.  The  new  residual  is  then  given  by 


where 


Zi  =  di  -  s(a?i),  i  —  1, . . .  ,m. 


This  iterative  process  can  either  be  continued  to  convergence,  or  used  as  a  preconditioner 
followed  by  GMRES.  Table  2  shows  some  run  times  taken  to  obtain  an  error  of  less  than 
1  x  10"6  for  the  Franke  1  function  (see  [13]  for  the  definition  of  this  function).  Random 
nodes  were  generated  in  [0,  l]2  and  an  Intel  Celeron  PC  was  used.  Recently, 


Number 
of  nodes 

Number  of 
iterations 

Time 

(seconds) 

10,000 

8 

7.0 

20,000 

8 

17.5 

40,000 

6 

35.5 

80,000 

6 

105.7 

160,000 

7 

407.8 

Tab.  2  Run  times  for  domain  decomposition. 

the  group  at  Leicester,  using  a  twin  processor  Compaq  PC,  has  obtained  solutions  to 
a  problem  with  1,000,000  random  points  in  less  than  9  minutes,  and  we  can  safely  say 
that  the  combination  of  domain  decomposition  methods  and  multipole  fast  evaluation 
has  produced  a  robust  and  effective  method.  Most  practitioners  will  be  aware  of  other 
ways  to  run  a  domain  decomposition  algorithm.  In  particular,  one  can  use  a  nesting 
approach  where  one  starts  with  only  four  subdomains  each  containing  large  numbers 
of  points.  To  solve  each  subdomain  problem,  one  subdivides  again  and  does  domain 
decomposition  in  the  subdomain. 
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Abstract 


Procedures  for  orthogonalisation  of  Gaussians  and  B-splines  are  recalled  and  it  is  shown 
that,  provided  Gaussians  are  negligible  in  appropriate  regions,  the  same  recurrence  for¬ 
mulae  may  be  adopted  in  both  and  render  the  computation  relatively  efficient.  Chebyshev 
polynomial  collocation  is  well  known  to  be  rapidly  defined  by  discrete  orthogonalisation, 
and  similar  ideas  are  commonly  applicable  to  partial  differential  equations  (PDEs)  and 
integral  equations  (IEs).  However,  it  is  shown  that  the  most  elementary  mixed  methods 
(both  boundary  conditions  and  PDEs  being  satisfied)  for  the  Dirichlet  problem  in  rect¬ 
angular  types  of  domain  can  lead  to  a  singular  linear  system,  which  may  be  rendered 
non-singular,  for  example,  by  a  small  modification  of  interpolation  nodes. 


1  Introduction 

Gaussian  radial  basis  functions  (RBFs)  are  negligible  outside  a  certain  range,  which  de¬ 
pends  on  the  accuracy  required  and  the  exponent  used.  For  example,  if  four  decimal  place 
accuracy  is  sufficient,  then  outside  [—2,2]  the  function  e~Ax  is  negligible  for  A  >  2.5. 
Indeed  the  translated  RBFs 

=  g-M*-4)*  i  —  —1,0,. . .  ,n  +  1,  (1.1) 

resemble,  at  least  superficially,  a  set  of  translated  cubic  B-splines,  each  having  a  support 
of  four  sub-intervals  of  length  one,  contained  in  [i  —  2,z  +  2]. 

Following  work  of  Mason  et  al  [4]  and  Goodman  et  al  [1],  we  show  that  these  RBFs, 
rounded  to  the  required  accuracy,  may  be  conveniently  and  efficiently  orthogonalised  so 
that 
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(i)  a  4  term  recurrence  may  be  adopted  identical  to  the  one  in  [4] 
for  cubic  B-splines, 

(ii)  inner  products  may  be  determined  very  simply  in  terms  of  4  parts 
of  a  normal  distribution, 

(iii)  a  well  conditioned  calculation  results  and  best  l2  approximations 
may  be  obtained  immediately  with  an  orthogonalised  basis, 

(iv)  a  continuous  or  discrete  inner  product  (and  best  approximation) 
may  be  adopted. 

In  a  second  application  of  orthogonalisation,  this  time  to  polynomials,  it  is  shown 
that  a  two-dimensional  (n  + 1)  x  (n  + 1)  polynomial  collocation  problem,  which  includes 
amongst  its  nodes  n  Chebyshev  polynomial  zeros  on  each  of  4  sides  of  a  square,  leads  to 
a  singular  (rank  one  deficient)  system.  For  all  n ,  one  superfluous  equation  is  readily  iden¬ 
tified  and  a  suitable  replacement  equation  is  readily  found.  Discrete  orthogonalisation  is 
used  to  combine  and  greatly  simplify  the  equations  and  prove  singularity. 

2  Orthogonalised  Gaussians 

An  orthogonal  system  {Pi}  is  developed  from  the  Gaussians  fa  in  (1.1)  using 

Pk  =  4>k~  dk\Pk~\  —  o>k2Pk-2  ~  akzPk-3,  k  =  -1, . .  • ,  n  +  1,  (2.1) 

where  ais  =  ao3  =  <202  =  a_i,3  =  i)2  =  cl~ i,i  =  0. 


Now  we  define  coefficients  bkr,  for  r  =  0,  1  and  k  =  —1, . . . h  + 1,  as  the  inner 

products 

&kr  =  {fikityk—r)  ~  I  ^kipPj^k— r (2*^) 

Jlk,  r 

where  Ik,r  is  the  common  support  of  <fik  and  4>k-r  and  normalising  constants  rik  are  the 
squared  norms 

nk  =  \\Pkf  =  (Pk,Pk),  (2.3) 

where  (•,  •)  is  the  inner  product  (2.2)  and  ||  •  ||  is  the  corresponding  norm. 

Then,  setting  (Pfc,Pfc_r)  =  0  for  r  =  1,2,3  gives 

(0/ej  Pk—r)  ~  O'kr'^'k—r'  (2.4) 

Taking  the  inner  product  of  (2.1)  with  itself  gives 
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3 

nk  —  bko  +  ^2  [—2 akr  {4>k,  Pk-r)  +  atrTik-r ]  ?  (2.5) 

r= 1 

which,  by  using  (2.4),  gives 

3 

nk  =  6*0  -  ^2alrnk-r-  (2-6) 

r=l 

This  is  the  first  basic  equation  for  writing  {n^}  in  terms  of  {a^r}  and  {bkr}. 

Now,  using  (2.1),  with  k  replaced  by  k  —  1,  k  —  2,  k  —  3  we  obtain 


Pk- 3)  =  bkz  =  ak^nk- 3 

(2.7) 

Hence 

(<Pk,  Pk- 2)  =  bk2  “  0>k~2A  (<t>k,Pk-s)  • 

Ufc2^fe-2  =  bk 2  -  afc-2,1^3* 

(2.8) 

Finally 

so  that 

(<j)k,  pfc-l)  =  bk  1  -  afc-1,1  ( <l>k >  Pk- 2)  -  Gfc-1,2  (</>k,Pk- 3)  7 

O'kl'fT'k-l  =  bkl  -  Gfc-1,1  (dk2^k-2)  “  &A~1,2&A:3- 

(2.9) 

Equations  (2.6),  (2.7),  (2.8)  and  (2.9)  may  be  solved  to  determine  all  the  required  coef¬ 
ficients  {akr}  and  {n/J  explicitly  by  substitution,  starting  from  n_i  —  ||<£_i||2.  This 
involves  0(n)  operations  for  n  +  3  basis  functions.  The  best  approximation  to  a  func¬ 
tion  /  (either  continuous  /  =  f(x)  or  discrete  /  =  (/i,...,/m)T  )  by  orthogonalised 
Gaussians  may  be  determined  explicitly  as 

71  +  1 

f  ~ 

j——i 

where  Cj  =  (P.nPj)~l  {, f,Pj )  =  {f,Pj) . 

2.1  Numerical  example 

Here  we  use  the  procedure  for  constructing  orthogonalised  Gaussians  to  produce  an 
interpolant  to  data  obtained  from  a  fast  response  oscilloscope1 .  To  the  left  of  Figure  1 
we  see  the  first  three  orthogonalised  Gaussian  functions,  with  centres  specified  at  the 
integers  -1,0  and  1,  with  support  growing  from  left  to  right.  The  figure  on  the  right 
shows  the  oscilloscope  data  **  and  the  fitted  o-Gaussian  interpolant  — . 

Oscilloscope  data  supplied  by  Centre  for  Electromagnetic  and  Time  Metrology,  National  Physical 
Laboratory,  London,  UK. 
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In  this  example  we  use  512  centres  and  choose  A  =  2.5  in  (1.1).  Since  our  choice 
for  A  requires  only  four  decimal  place  accuracy,  the  normal  equations  produce  the  usual 
identity  matrix  and  the  coefficient  vector  {c_i, . . . ,  cn+i}  can  then  be  determined  by  the 
equations  c  —  ATf  where  /  =  {/i, . . . ,  /m}  and  Aij  =  Pj(xi).  The  fit  is  extremely  good 
and  vindicates  the  neglecting  of  the  Gaussians  outside  the  interval  considered. 


FlG.  1.  First  three  orthogonalised  basis  functions  and  o-Gaussian  fit  to  oscilloscope  data. 

2.2  Extensions  to  orthogonalised  Gaussians 

The  following  extensions  are  clearly  possible. 

(i)  Use  of  generally  placed  centres  (knots)  and/or  a  discrete  inner  product. 

(ii)  Use  of  higher  dimensions  -  as  in  Anderson  et  al  [2]. 

(iii)  Replacement  of  interval  (—00,00)  in  a  continuous  norm  by  [0,n] 
and  [0,  n]  by  [0, 1]  using  scaling. 

(iv)  Consideration  of  a  function  with  wider  (approximate)  support,  such  as  [—3,3] 
or  more  generally  [— r,  r]  for  r  >  2. 

3  Chebyshev  polynomials  in  two-dimensional  collocation 

The  (first  kind)  Chebyshev  polynomial  Ti(x)  of  degree  i  is  defined  by 

Ti(x)  —  cosi0  i  =  0, -1  <  x  <  1,  (3.1) 

where  x  =  cos  0  and  0  <  6  <  7r. 

Among  its  many  properties  is  the  discrete  orthogonality  property 
m  f  0  for 

y lTj(xk)Tj(xk)  =  <  m  for  i=j  =  0  (3.2) 

fc=i  (  for  i  =  j  7^  0, 

where  Xk  are  the  m  zeros  of  Tm( x),  namely 
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Xk  -  COS  (^~2~~)  >  (3.3) 

The  orthogonality  property  of  (3.2)  is  not  a  unique  one  amongst  the  Chebyshev  poly¬ 
nomials  of  four  kinds.  Indeed,  Mason  and  Venturino  [5]  showed  that  there  are  at  least 
fourteen  such  formulae,  depending  on  alternative  weights,  choices  of  Chebyshev-related 
abscissae  and  kinds  of  Chebyshev  polynomial. 

3.1  The  elliptic  problem  — -  mixed  methods 

Let  us  now  exploit  this  property  (3.2)  in  a  pseudo-spectral  method  for  a  linear  elliptic 
PDE  problem  on  a  square.  The  PDE 


Lu  =  f(x,y),  |a:|,'|j/|  <  1, 


subject  to 


u  =  g{x,y),  (3.5) 

where  g(x,  y)  is  a  function  known  explicitly  only  on  x  =  ±1  and  y  =  ±1,  can  be  solved 
approximately  in  the  form 

m  n 

u  =  umn=Y,'  Y,' aVT^)Tj(vl  (3-6) 

i=0  j= 0 

where  a  dashed  summation  denotes  that  the  first  term  in  a  sum  is  halved. 

To  obtain  equations  for  a^,  we  solve 

Lumn  =  /,  at  the  (m  -  1)  x  (n  -  1)  zeros  of  Tm_i(x)Tn_i(t/),  (3.7) 

umn  =  g ,  on  x  =  ±1  at  zeros  of  Tn(y)  (2 n  equations),  (3.8) 

umn  =  9,  on  y  =  ±1  at  zeros  of  Tm(x)  (2m  equations).  (3.9) 

Together  (3.7)-(3.9)  form  (m  +  1)  x  (n  +  1)  equations  for  {a*.,}.  However,  we  claim  that 
the  included  equations  (3.8),  (3.9)  are  singular  of  joint  rank  2m  +  2n  -  1.  If  this  is  so, 
then  the  system  is  singular  without  consideration  of  the  PDE  collocation  equations  (3.7). 
The  equations  (3.8),  (3.9)  become 


9k, ±1  =  £'  E'  o,ijTi(xk)Tj(±l),  g±u  =  E'  E'  aijTi(±l)Tj(ye), 

i- 0  j- 0  t=0  jf~0 

where  Xk ,ye  are  zeros  of  Tm(x),Tn(y)  respectively  and  where 

0M  =  9(  9-h  i  =  p(- Vt), 

9k,  1  ^  p(*^fe)  I)?  9k,  —  1  =:  9{%ki  ~ !)• 


(3.10) 
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If  we  add/subtract  the  first  pair  and  also  the  second  pair  of  equations  in  (3.10),  noting 
that 

Tj(  1)  =  1,  Tj(-l)  =  (-1)*, 

we  deduce  that 


991/  9b  tv 

40)  =  E '  E '  aHTM  41}  =  E '  E '  k  =  l,...,m,  (3.11) 


”0  j= 0 
(j  even) 


i= 0  j= 0 
(j  odd) 


40)  =  E '  E'a^fe)>  41)=E'  E'ayW.  e=h---,n,  (3.i2) 

i=0  3=0  i= 0  3=0 

( %  even)  (i  odd) 


where, 


40)  =  +9k,-i),  41  =  §(5ft,  1  -5ft, -i); 

efc0)  =  |(5i,«  +  5-1, «)>  4°  =  |(5i, /  -  5-1,/)- 


Multiplying  (3.11)  by  2Tr(xk)/(m  + 1)  and  summing  over  fc,  and  multiplying  (3.12)  by 
2Ts(^)/(n  +  1)  and  summing  over  £,  discrete  orthogonality  (3.2)  gives 


4°+i  =  E '  arj  =  4+1.  4+i=Ea^  =  4+i>  r  =  0 . m-l,  (3.13) 


J=0 

(j  even) 


j=r 
(j  odd) 


C3+1  =  E 7  =  4+1 1  <4+1  =  E  a,s  =  4+1 »  S  =  0, . . . ,  m  -  1,  (3.14) 


i=0 

(i  even) 


where 


i=0 
(i  odd) 


2  Jfi )rr  \  M 


E4,TrW  4 


e  -  ^Ti  E40)^ft)  4+i 

e  4+i  -  ^E«?W 


This  constitutes  a  greatly  simplified  system  to  replace  (3.10).  Indeed  we  may  verify 
that,  for  m  =  n, 

m—  1  m— 1 

E/  d(0  _  V  '  r(t) 

■^i+l  “  2^ 
i=0  i=0 

(m  —  i  odd)  (m  —  i  odd) 


(3.15) 
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where  t  =  0, 1  for  m  —  odd,  even,  respectively,  and  hence  that  the  equations  (3.13)  and 
(3.14)  are  singular.  For  example,  for  m  (=  n)  =  2,  we  seek  equations  in  a00, . . . ,  <222,  and 
(3.13)  gives 


meanwhile  (3.14)  gives 


4°^  =  5°00  4-  ^02? 

r\v 

=  «01 

40)  =  4-  2, 

p(!) 

n2 

=  an 

1°^  —  la00  4-  020, 

cl" 

=  a\o 

'2°^  =  5a0l  4-  <^21  ? 

c™ 

=  an 

(3.16) 


(3.17) 


Clearly  =  C2l\  consistent  with  (3.15)  for  m  =  2.  Which  equation  do  we  eliminate? 
For  simplicity,  in  the  case  of  m  even,  we  delete  the  equation  for  C2^  and  replace  it  by 
the  equation  for  R^+i  -  It  is  easy  to  verify  that,  within  the  system  (3.13)  and  (3.14),  this 
leads  to  full  rank,  and  is  equivalent  to  boundary  specifications  of  either  of 


it(0, 1)  +  u(0,  -1), 

u(l,  1)  +  ti(-l,  1)  +  u(l,  -1)  +  it(- 1,  -1). 
For  m  =  n  =  2,  this  is  equivalent  to 


(3.18) 


Rf]  s|a20  +  a22.  (3.19) 

In  the  case  when  m  is  odd,  we  delete  the  equation  for  C ^  and  replace  it  by  the 
equation  for  C^x,  the  latter  being  equivalent  to  adding  four  boundary  point  conditions 
anti-symmetrically,  i.e., 


u(  1, 1)  -  u{~  1, 1)  +  u{- 1,  -1)  -  u(l,  -1).  (3.20) 

If  g(x,y)  is  known  everywhere  in  the  square,  then  we  could  of  course  consider  replacing  a 
mixed  collocation  problem  by  an  interior  collocation  problem  by  including  the  boundary 
conditions  automatically  in  the  form  of  approximations.  For  example,  we  could  replace 
the  form  (3.6)  by 

m— 2  n—2 

Umn  -  (x2  -  l)(y2  -  1)  X  '  X?  aijTi(x)Tj(v)  +  9(x,y),  (3.21) 

i— 0  j— 0 

or  by  an  alternative  form  such  as 

i  m  n 

umn  =  ^  ^  *5 aij  (Ti(x)  —  Ti^x)}  ^Tj(y)  —  Tj^y)^  4-  g(x,y),  (3.22) 

7=0  j  =0 
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where  T*  =  To(x)  or  T\(x)  according  as  i  is  even  or  odd.  These  forms  have  the  disad¬ 
vantage  of  being  difficult  to  generalise  to  other  kinds  of  (non-rect angular)  boundaries, 
although  (3.21)  is  adaptable  to  the  case  where  an  equation  of  the  boundary  is  known 
(see  Mason  [3]). 

The  best  Chebyshev  method  available  for  the  Poisson  problem  oil  a  rectangle  is 
probably  a  “differentiation  matrix”  method,  such  as  is  described  in  Trefethen  [6] ,  which 
represents  the  solution  by  nodal  values  rather  than  Chebyshev  coefficients. 

Acknowledgement:  We  thank  the  referees  for  their  perceptive  remarks. 
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Abstract 

Scattered  exact  and  non-exact  data  are  approximated  by  means  of  radial  basis  functions 
with  compact  support  and  the  related  knot  selection  is  based  on  the  information  given 
by  the  discrete  Gaussian  curvature  defined  on  a  data  triangulation.  In  case  of  non-exact 
data,  a  strategy  to  obtain  a  sign-reliable  estimate  of  its  distribution  is  given  extending 
an  approach  already  studied  by  the  authors  for  non-exact  2D  data. 

1  Introduction 

It  is  well  known  that,  for  any  interpolation/approximation  scheme,  data  shape  preserva¬ 
tion  is  often  a  desirable  quality  and,  as  a  consequence,  the  determination  of  some  criteria 
to  establish  the  data  shape  is  a  very  important  topic.  For  this  purpose,  the  use  of  the 
discrete  curvature  in  case  of  exact  2D  data  is  a  standard  approach.  On  the  other  hand,  in 
case  of  non-exact  data,  the  proposal  in  [6]  allows  the  determination  of  a  reasonable  and 
sign-reliable  discrete  curvature  estimate  if  the  maximum  data  error  is  a  priori  given.  In 
recent  literature,  interesting  formulas  have  been  introduced  [3,  4]  for  defining  the  discrete 
Gaussian  curvature  when  scattered  3D  exact  data  are  given  and  a  related  triangulation 
is  assigned.  Starting  from  these  formulas,  the  approach  considered  in  [6]  is  extended  to 
the  case  of  3D  scattered  non-exact  data  in  order  to  define  a  reasonable  and  sign- reliable 
estimate  of  the  Gaussian  curvature  at  the  data  points  thereby  obtaining  important  shape 
information.  Thus  we  get  some  suggestions  for  determining  the  supports  of  the  local  ra¬ 
dial  basis  functions  [8]  used  in  the  approximation  scheme  together  with  the  number,  the 
position  and  the  multiplicity  of  the  related  knots.  The  result  is  a  good  approximating 
surface  (in  particular  with  respect  to  its  shape)  with  a  high  data  reduction  [2,  7]. 

The  outline  of  the  paper  is  as  follows.  In  Section  2  the  discrete  Gaussian  curvature  is 
defined  and  an  inequality  is  given  to  check  its  sign-reliability  in  case  of  non-exact  data. 
In  Section  3  the  approximation  scheme  is  presented  and  the  knot  selection  strategy  is 
given.  Finally,  in  Section  4  some  numerical  results  are  presented  to  illustrate  the  features 
of  the  proposed  approach. 

2  Information  about  the  shape 

In  this  section,  following  the  approach  presented  in  [3,  4],  we  define  the  discrete  Gaussian 
curvature  (dGc)  to  obtain  information  about  the  shape  suggested  by  the  data.  For  this 
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purpose,  we  need  the  following  notation 

•  ^xy  :=  {Xj  =  ( xj,yj),j  =  1,  ...,7V}  C  IR2  is  the  set  of  the  assigned  distinct 
vertices  on  the  ary-plane; 

•  V  :=  {P j  =  (Xj,Zj),j  =  1, . . .  ,7V}  C  IR3  is  the  data  set,  with  Zj  —  /(Xj); 

•  T  :=  {lj  €  IN3,  1  <  J/y,<  N,k  =  1,2,3,.?  =  1,.  ..,T}  is  a  given  triangulation  of 

Thus,  for  any  Xj  e  not  belonging  to  the  boundary  of  the  convex  hull  of  Vxy  we  can 
define  the  integral  Gaussian  curvature  with  respect  to  a  related  area  Sj,  [3] 

fc=l 

where  the  angles  a^\k  =  1, . . . ,  rij  are  as  follows 


ak3)  :=  z(efc)>4+i)>  4J)  :=  vij)  - PJ> 


o'*" 


and  {Vp\ . . . ,  V^}  C  V  is  the  set  of  ordered  neighboring  points  of  P j  given  by  the 
assigned  triangulation.  To  derive  the  curvature  at  the  vertex  P j  from  the  above  integral 
value,  we  normalize  by  the  Voronoi  area  Sj  [4] 


(2.1) 

If  Xj  is  on  the  boundary  of  the  convex  hull  of  Vxy ,  some  auxiliary  suitable  “phantom” 
points  should  be  defined  in  order  to  obtain  a  reliable  estimate  of  the  Gaussian  curvature 
from  (2.1). 


Fig.  1.  The  triangulation  (left)  and  the  discrete  Gaussian  curvature  (right). 


Shown  on  the  left  of  Figure  1  is  the  Delaunay  triangulation  related  to  a  set  Vxy  of 
441  scattered  vertices  in  the  unit  square  and  shown  on  the  right  is  the  discrete  Gaussian 
curvature  distribution  related  to  the  Franke  function  sampled  on  Vxy . 
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In  case  of  non-exact  data,  we  need  to  check  the  sign-reliability  of  Kj  for  deriving 
some  useful  information  about  the  shape  suggested  by  the  data.  For  this  purpose,  we 
use  the  theorem  below,  where 


Si) 


>U).  G(i) 

k  L.  -  1  n-\  , 

,  /V  — -  1  ,  .  .  .  ,  / 1 J  , 


/c, 


(2.2) 


Remark  2.1  Kj  is  an  approximation  of  Kj  obtained  by  replacing  the  angle  with 


Theorem  2.2  Let  P j  €  ]R3,j  =  be  assigned  distinct  non-exact  data  points 

and  let  e  be  a  positive  quantity  such  that  |Pj  —  PJ|  <  e,  j  —  1, . . . ,  iV,  where  PJ  is  the 
(unknown)  exact  data  point  corresponding  to  Pj.  If  e  is  sufficiently  small  and 

to  i  >5^Et4;'  <2-3) 

k—1  leA:  I 

then 

KjKj  >  0, 

where  Kj  is  defined  as  Kj  using  the  exact  data  points . 

Proof:  Let  us  consider  a  point  P^  and  its  neighboring  points  {V^\ . . . ,  V^}  C  V  and 
let  us  write  the  corresponding  (unknown)  exact  points  as  follows 

PJ  :=Pj-e0w0, 

Yk)e  :=  Vjjf’  —  e*Wfc,fc  = 


with  0  <  €0,€i <  e  and  |w0|  =  |wi|  =  •••  =  |w„J  =  1. 

So,  if  e  is  sufficiently  small,  we  can  define  the  non-zero  vectors 


>(i)e _ y  t?> 


p5 


and  we  have 


Thus,  if 


fj)e  „  (j) 
'k  ~  Gk 


Xi)e 

,0')e _ *>k  ’  efc-fl 


c  := 


0)e  |  |0(i)e  | 5 
k  I  lefc+il 


using  a  first  order  Taylor  approximation,  we  obtain 
,0>  __  ~(i)r  -  ‘  '  -  * 


cr=cHi+Afc)+ 


,W)|  iji) 
■k  I  lefc+l 


Bk  +  0{e2) 
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where 


A  -e  eiJ)-wfc  i  r  et+rWfc+i  r 

Ak  ~  ek  |gO')[2  +  efc+l  je^J2  C° 


-™-K  —  '-K  ,  (j)  ,2  1  cfc+l  ,  (j)  ,o  i  fj)  +  (j)  ?  /  v» 

iefc  i2  K+il2  \Jefc  i  |efc+i'  / 

#Jfc  =  -«fc4+i  •  wfc  -  €fc+iefc')  •  Wfc+1  +  e0(e{k3)  +  ej^)  •  w0. 


Thus,  we  can  write 


=  K3  ( 1  +  }  + 


Bk 

tU)\\j3)  I 
'fc  llefc+ll 


+  0{e 


So,  if  e  is  sufficiently  small,  /G/Cf  >  0  if 


5f:E  M*^  + 


l^lle&l 


and  this  is  true  if 


'Jl  fc=i 


ek  iiefc+i 


>  -4/5. 


Now,  from  (2.4)  it  is  easy  to  verify  that  \Ak\  <  2e(|ej^|  1  -f  |e^x|  x)  and  \Bk\  < 

2e(e^|_1  +  |e^1|“1)|e^||e^1|.  Using  these  inequalities,  after  alittle  algebra,  we  obtain 
that,  if  e  is  sufficiently  small,  (2.3)  implies  (2.5).  □ 

If  6  is  an  assigned  small  positive  quantity  such  that  \Pj— P||  <  e,  j  —  1, . . . ,  IV,  if  (2.3) 
holds  we  use  (2.1)  to  define  Kj  because  we  consider  it  sign-reliable.  Otherwise,  we  try 
to  get  information  about  the  sign  of  the  Gaussian  curvature  at  the  point  ,  repeating 
the  check  after  substituting  the  neighboring  points  of  P j  with  other  new  suitable  Uj 
points.  In  particular,  these  are  chosen  among  the  neighbors  of  all  the  Vj^ ,  k  —  1, . . . ,  rij 
and  they  are  uniformly  spaced  as  much  as  possible  with  respect  to  the  azimuth  (defined 
relating  to  P^).  If  after  this  substitution  (2.3)  holds  the  new  neighboring  points  are  used 
to  define  Kj  through  (2.1),  otherwise  this  strategy  is  repeated  until  we  consider  that  the 
new  neighbors  are  too  far  from  P j .  In  the  last  case,  we  put  the  curvature  value  equal  to 
0. 

3  Knot  selection  in  radial  approximation 

Let  (j)  :  fft>o  1R,  be  a  compactly  supported  radial  basis  function.  We  approximate  the 
given  data  by  the  surface 


z(X)  :=  ao  + 


X-XJ 


where  the  set  of  knots  {Xz*,  l  =  1, . . . ,  M}  G-Vxy  and  the  set  of  positive  ^-parameters 
{SiJ  =  1, . . . ,  M}  are  previously  chosen.  The  coefficients  ao,... ,  clm  are  determined  min¬ 
imizing  Y;IjLi(zj~z(X-j))2-  The  knot  number  and  their  positions  are  selected  considering 
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the  information  given  by  the  discrete  Gaussian  curvature  distribution  as  defined  in  the 
previous  section. 

Inspired  by  the  algorithm  proposed  in  [6],  the  strategy  for  the  X*  and  5j,  l  =  1, . . . ,  M 
choice  can  be  summarized  as  follows: 

•  an  input  tolerance  tola  is  given; 

•  a  first  set  of  distinct  knots  {Xjf,/  =  l,...,Mo}  C  Vxy  with  Mo  <  M  is  chosen. 
This  is  done  selecting  the  areas  where  the  absolute  value  of  the  discrete  Gaussian 
curvature  is  greater  than  tolc .  A  knot  is  located  in  the  middle  of  an  area  if  the  sign 
of  the  related  curvature  is  positive.  In  case  of  negative  curvature,  four  knots  are 
located  near  the  boundary  of  the  area  also  taking  into  consideration  the  suggestions 
given  by  the  data  distribution; 

•  initial  values  for  the  5-parameters  5i,l  =  1, . . . ,  Mo  are  determined  considering  the 
knot  separation  distance; 

•  the  final  set  of  knots  is  defined  by  possibly  increasing  the  multiplicity  of  the  pre¬ 
viously  selected  knots.  In  this  case,  the  5-parameters  associated  to  the  same  knot 
must  be  different. 

Remark  3.1  We  observe  that ,  to  be  sure  that  the  least  squares  problem  has  a  unique 
solution ,  it  should  be  proved  that  the  related  collocation  matrix  is  of  full  rank  and  this  is 
clearly  equivalent  to  the  uniqueness  of  the  corresponding  interpolation  problem  (the  only 
result  we  know  about  uniqueness  of  the  radial  interpolant  defined  with  different  scales  is 
given  in  a  submitted  paper  [1]  where  interesting  sufficient  conditions  are  given).  However , 
we  believe  that  the  least  squares  problem  is  much  more  robust  than  the  corresponding 
interpolation  problem  and  in  all  the  numerical  experiments  we  have  never  had  problems 
related  to  the  rank  of  the  collocation  matrix  (see  also  [5,  7]). 

4  Numerical  results 

In  this  section  we  use  the  compactly  supported  radial  basis  function  [8] 

<fi(r)  :=  (1  -  r)l(l  +  3r) 

for  checking  the  features  of  the  proposed  approach  on  two  test  functions.  The  first  is 
the  well  known  Franke  function  and  the  second  is  the  function  z(X)  =  0.35(sin(2<7nr)  + 
sin(27T2/)),  X  e  [0,  l]2.  For  both  tests,  N  =  441  data  points  are  considered.  The  exact 
data  are  obtained  by  evaluating  the  functions  at  the  vertices  represented  on  the  left  of 
Figure  1.  The  corresponding  non-exact  data  are  defined  adding  a  random  noise  to  the 
exact  values.  In  particular,  in  the  first  test  we  have  used  e  =  0.07  and  in  the  second  we 
have  used  e  =  0.08,  in  [0,0. 5]2  U  [0.5,  l]2  and  e  —  0.008,  otherwise.  The  related  discrete 
Gaussian  curvature  (dGc)  distributions  computed  with  the  strategy  sketched  at  the  end 
of  Section  2  are  reported  in  Figure  2. 

Figures  3  and  4  relate  to  the  first  test  with  exact  and  non-exact  data,  respectively. 
The  distinct  knots  are  XJ  =  (0.207,0.205),  X*2  =  (0.449, 0.797),  XJ  =  (0.756,0.349)  and 
each  of  them  is  repeated  three  times  with  three  different  5-parameter  values,  0.6, 0.4, 0.3. 

The  mean  error  “  z{Xj))2/N  is  about  0.016  in  Figure  3  and  0.025  in  Figure 
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Fig.  2.  dGc  for  the  first  (left)  and  second  (right)  set  of  non-exact  data. 


Fig.  3.  The  parent  Franke  surface  (left)  and  its  approximation  (right). 


4  (it  was  about  1/3  using  only  3  distinct  knots  with  all  the  ^-parameters  equal  to  0.6). 
Figures  5  and  6  relate  to  the  second  test.  The  distinct  knots  are  (0.258, 0.238),  (0.749, 0.737), 
(0.950, 0.264),  (0.700, 0.264),  (0.756, 0.050),  (0.756, 0.300),  (0.050, 0.751),  (0.300, 0.751), 
(0.264,0.700),  (0.264,0.950).  The  related  5-parameters  are  0.8, 0.8, 0.6, 0.4, 0.6, 0.4, 0.6, 
0.4, 0.4, 0.6.  The  mean  error  is  about  0.020  in  Figure  5  and  0.026  in  Figure  6. 

Acknowledgments:  The  authors  would  like  to  thank  the  referees  for  their  useful  com¬ 
ments. 
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Abstract 

In  this  paper  we  consider  the  boundary  over  distance  preconditioner  for  radial  basis 
function  interpolation  problems.  We  give  both  theoretical  and  numerical  results  indicating 
that  it  performs  extremely  well. 

1  Introduction 

Let  $  :  lZd  —►"‘ft,  X  =  {a?i , . . . ,  Xjv}  be  a  set  of  N  distinct  points  in  1Zd  and  /  be  a  real 
valued  function  which  we  can  evaluate  at  least  at  the  x*’  s.  Define 

9  ■  9  =  Eili  -  xi) 

where  Ejli  ^jfl(xj)  —  0,  for  all  q  6  irf 

We  consider  the  problem  of  finding  an  element  s  of  S$,x  +  7r d  satisfying  the  interpolation 
conditions 

s(xi)  =  f(xi),  for  all  x*  €  X.  (1.2) 

Assume  $  is  strictly  conditionally  positive  definite  of  order  2  (SCPD2)  and  X  is  uni¬ 
solvent  for  7rd.  Then  there  is  a  unique  element  of  S $5x  +  nf  satisfying  the  interpola¬ 
tion  conditions  (1.2).  This  setting  includes  popular  choices  of  the  basic  function  such 
as  the  thin-plate  spline,  $(•)  =  |  -  | 2  log  |  -  | ,  and  minus  the  ordinary  multiquadric, 
$(•)  =  —  \/|  •  |2  -f  c2.  In  this  paper  we  consider  various  ways  of  formulating  the  in¬ 
terpolation  problem,  showing  in  particular  that  a  certain  inexpensive  change  of  basis 
can  dramatically  improve  its  conditioning. 

The  usual  way  to  formulate  this  problem  is  in  terms  of  the  functions  {$(•  —  X*)}  and 
some  basis  {pchPu  •  •  •  >Pd}  for  7rf.  Then  the  interpolation  conditions  together  with  the 
side  conditions  taking  away  the  extra  degrees  of  freedom  introduced  by  the  polynomial 
part  can  be  written  as 

AA  +  Pc  =  /  and  PT  A  =  0,  (1.3) 

where  Ay  =  $(x*  —  xy),  Py  =  Py(xj),  and  /  —  [f(x i), . . . ,  /(xat)]t.  It  is  well  known  [3, 
4,  5]  that  the  matrix 

(1.4) 


= 


A  P 
PT  O 
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of  this  usual  formulation  is  frequently  badly  conditioned,  even  when  the  number  of  nodes 
is  small.  Indeed  many  authors  have  commented  on  the  numerical  difficulties  that  solving 
this  system  presents  [3,  4,  5].  Results  of  Narcowich  and  Ward  show  that  conditioning  of 
the  system  (1.4)  depends  very  heavily  on  the  geometry  of  the  nodes.  However,  frequently 
in  numerical  analysis  a  change  of  basis,  or  other  reformulation,  can  make  a  highly  in¬ 
tractable  problem  tractable.  Hence,  our  goal  is  to  find  an  inexpensive  but  highly  effective 
preconditioner  for  RBF  interpolation  systems. 

In  this  paper  we  establish  properties  of  a  preconditioning  method  for  the  RBF  inter¬ 
polation  equations  which  was  first  presented  in  Sibson  and  Stone  [5].  In  the  following 
section  we  give  a  detailed  account  of  the  preconditioning  method.  In  Section  3  we  prove 
that  the  construction  produces  an  N  x  (N— 3)  matrix  Q  whose  columns  are  orthogonal  to 
P,  and  which  is  of  full  rank  whenever  the  nodes  X  are  unisolvent  for  7rf .  Finally,  Section 
4  contains  numerical  results  for  different  SCPD2  basic  functions  over  a  range  of  data 
sets  and  scales.  These  numerical  results  show  that  using  this  inexpensive  0(N  log  N) 
flop  preconditioner  and  variants  of  it,  dramatically  improves  the  conditioning  of  RBF 
interpolation  problems.  See  Figure  1  below. 


(a)  Multiquadric  basic  function.  (b)  Thin-plate  spline  basic  function. 

Fig.  I.  Sorted  2-norm  condition  numbers  of  the  unpreconditioned  matrices,  (top) 
and  of  the  preconditioned  matrices,  £,  (bottom)  for  fifty  thousand  random  data  sets  of 
size  one  hundred. 


2  A  preconditioning  method 

A  general  approach  to  preconditioning  interpolation  problems  with  SCPD2  basic  func¬ 
tions  in  V?  [1,  5]  is  to  choose  Q  as  any  N  x(N  —  3)  matrix  whose  columns  are  orthogonal 
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to  P  and  has  rank  N  —  3.  Letting  A  =  Q(i  and  premultiplying  (1.3)  by  QT  gives  the  new 
system  to  be  solved  for  or  equivalently  A, 

Bji  =  QTf  where  B  =  QTAQ.  (2.1) 

The  three  polynomial  coefficients  can  then  be  found  by  a  small  subsidiary  calculation. 

In  this  section  we  present  the  boundary  over  distance  method  of  Sibson  and  Stone  [5] 
for  constructing  the  matrix  Q.  We  will  prove  in  the  subsequent  section  that  Q  has  full 
rank  and  is  orthogonal  to  P  for  any  set  of  distinct  nodes  X  =  {aq, . . .  ,x^}  C  P2, 
which  are  unisolvent  for  it2.  These  properties  of  Q  are  well  known  (see  e.g.  [1,  5])  to 
imply  that  the  matrix  of  the  preconditioned  system  B  —  QT AQ  is  positive  definite.  The 
construction  of  Q  is  appealing  in  that  for  “interior”  points  Xj  of  X  it  is  local.  That  is, 
for  such  points  the  entries  in  the  j-th  column  of  Q  depend  only  on  the  geometry  of  the 
nodes  near  Xj  and  not  on  any  properties  of  nodes  far  away. 

Choose  a  closed  bounded  convex  polygonal  region  W  of  P2  such  that  X  C  W. 
Suppose  without  loss  of  generality  that  {a;Ar_2,x;v-i,aw}  is  unisolvent  for  7 if.  We  will 
refer  to  these  points  as  special  points.  They  are  generally  chosen  so  that  they  are  well 
spread  throughout  W .  In  our  experience,  and  that  of  Sibson  and  Stone,  for  typical  data 
sets  the  choice  of  special  points  is  not  at  all  critical,  as  long  as  the  triangle  they  define 
has  largish  area.  However,  for  contrived  data  sets,  such  as  all  but  a  very  few  points 
on  a  straight  line,  the  choice  of  special  points  becomes  important.  In  these  cases  we 
have  observed  that  bad  choices  of  special  points  can  lead  to  large  condition  numbers. 
However,  the  strategy  of  choosing  the  three  special  points  to  maximise  the  area  of  the 
corresponding  triangle  has  always  led  to  small  condition  numbers. 

The  region  W  is  divided  into  panels  by  intersecting  a  Voronoi  diagram  of  the  points 
of  X  with  the  region  W.  We  denote  this  panelling  of  W  by 

N 

Yw{x)  =  \JVi 

i= 1 

where  V*  is  the  Voronoi  panel  about  the  ith  centre  and  is  defined  by 

Vi  =  {x  e  W  :  \x  -  Xi\  <\x  -  Xj |,  for  all  1  <  j  <  N  with  j  ^  £}. 

Recall  that  the  locus  of  points  equidistant  from  two  fixed  points  is  the  perpendicular 
bisector  of  the  segment  connecting  the  points.  It  follows  that  each  Voronoi  region  is 
polygonal.  Associated  with  a  panel  Vi  are  its  edges.  These  are  a  finite  number  of  dis¬ 
tinct  closed  line  segments  of  non-zero  length.  They  are  the  boundaries  between  different 
Voronoi  panels,  or  between  a  Voronoi  panel  and  Wc.  The  collection  of  all  edges  of  all 
the  Voronoi  panels  will  be  denoted  by  S. 

Definition  2.1  Two  polygonal  regions  of  P2  will  be  said  to  be  strongly  contiguous  if 
they  have  a  common  boundary  of  non- zero  length . 

Definition  2.2  Two  Voronoi  regions  V  and  Vj  will  be  said  to  be  C-related  if  there  is 
a  sequence 

{Vi,Vei,Ve2,...,Vem,Vj}1 


1  ^  ii  ji  &!■)  ♦  *  *  5  ^  X  3, 
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in  which  all  adjacent  pairs  are  strongly  contiguous . 

Loosely  speaking  V{  and  Vj  are  C-related  if  they  are  connected  by  a  chain  of  strongly 
contiguous  pairs.  C-related  is  an  equivalence  relation  on  the  set  {V*}^3  of  Voronoi 
regions  of  non-special  points.  Therefore  it  breaks  this  set  into  a  finite  number  of  nonempty 
equivalence  classes  {Qi  :  1  <  l  <  k}. 

Lemma  2.3  Let  Qi  be  any  of  the  equivalence  classes  above .  Then  there  is  at  least  one 
Voronoi  region  Vi  in  Qi  which  is  strongly  contiguous  to  either  Wc  or  one  of 
'{Vn-2,Vn-i,Vn}. 

Proof:  Consider  T  =  U  ^  • 

i:Vi€Ge 

This  union  is  a  closed  bounded  connected  polygonal  set  whose  boundary  can  be  written 
as  the  union  of  some  of  the  line  segments  from  S.  Recall  in  particular  that  all  these 
line  segments  have  non-zero  length.  Pick  one  line  segment  <  a,  b  >  from  the  boundary 
of  T.  Since  it  forms  part  of  the  boundary  of  T  on  one  side  of  it  lies  a  Voronoi  region 
Vi .  from  Qi.  On  the  other  side  lies  either  Wc  or  another  Voronoi  region  Vj.  In  the  first 
case  the  Lemma  is  proven.  Consider  the  second  case.  If  1  <  j  <  N  —  3  then  Vi  is 
strongly  contiguous  to  V}.  Consequently,  Vj  6  Qi.  This  contradicts  <  a,  b>  being  on  the 
boundary  of  T.  Hence,  N  -  2  <  j  <  N  and  the  Lemma  follows.  □ 

We  now  detail  the  construction  of  the  N  x(N  —  3)  matrix  Q  using  boundary  over  dis¬ 
tance  weights.  Note  that  because  most  elements  of  Q  are  zero  sparse  storage  of  Q  requires 
only  D(N)  memory.  A  non-special  point  from  {xi  :  1  <  i  <  N  —  3}  which  has  Voronoi  tile 
that  is  strongly  contiguous  to  Wc  will  be  called  a  Voronoi  external  point  Define  Ve(X) 
as  the  set  of  indices  of  all  Voronoi  external  points.  All  other  points  are  referred  to  as 
Voronoi  internal  points.  The  corresponding  indices  are  Vi(X)  =  {1,  — ,  N  —  3}  —  Ve(X). 

We  first  consider  forming  a  column  of  Q  for  an  index,  j,  such  that  j  G  Vj(X).  In 
this  case  the  panel  Vj  shares  non-trivial  edges  only  with  other  Voronoi  panels  and  not 
with  Wc .  The  column  is  formed  using  boundary  over  distance  weights,  found  from  the 
Voronoi  diagram.  For  j  G  Vj(X)  the  boundary  over  distance  weight  is 


b(xj,Xj) 

I  xi  ~  xj  I 


for  all  Vi  strongly  contiguous  to  Vj, 


(2.2) 


where  b(xi,Xj)  is  the  length  of  the  boundary  between  Vi  and  Vj.  For  other  values  of 
i  ^  j,  rij  is  set  to  zero.  In  order  that  column  j  of  Q  is  orthogonal  to  constants  the 
diagonal  element  Tjj  is  specified  as 


rn  ~  y 


(2.3) 


Finally,  the  j th  column  of  R  is  scaled  by  dividing  by  the  area  of  Vj  to  obtain  the  jth 
column  of  Q.  Note  that  the  column  is  by  construction  diagonally  dominant,  but  not 
strictly  so. 

If  j  £  VE(X)  then  Vj  is  stongly  contiguous  to  the  complement  of  W,  Wc .  The 
boundary  segment  corresponds  to  a  Voronoi  edge  between  Xj  and  an  artificial  point,  the 
reflection  of  Xj  in  the  boundary  (see  Figure  3  in  [7]).  The  reflected  point,  Xj,  can  be 
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written  as  a  linear  combination  of  the  special  points,  i.e., 

Xj  =  \n%N  +  +  \n-2%N-2,  (2.4) 

where  Ajy  +  Aat-i  +  Ajv-2  =  1.  If  Vj  has  k  edges  with  Wc  then  k  reflected  points 
{&],...,£*}  are  required.  Associated  with  each  reflected  point,  Xj ,  are  the  coefficients 

{Ajy,  A^_i,  _ 2 } *  The  boundary  over  distance  weights  for  Xj  are  partitioned  amongst 

the  special  points  to  obtain  for  all  j  E  Ve(X)  and  i  ^  j 


Vi  strongly  contiguous  to  Vjy 
i  €  {N,N  -  l,N  —  2}. 


Of  course,  Vj  could  be  strongly  contiguous  with  a  Voronoi  panel  associated  with  a  special 
point.  If  this  is  the  case  ry  =  j  +  ]C?=i  'M  j .  Again,  for  other  values  of  i  ^  j , 

Tij  is  set  to  zero.  Finally  rjj  is  specified  as  in  (2.3)  and  column  j  of  Q  is  defined  as 
column  j  of  R  scaled  by  dividing  by  the  area  of  Vj . 

Partition  Q  as 

Q-  p  >  (2.6) 

where  E  is  (iV  —  3)  x  (N  -  3).  Thus  E  results  from  interactions  between  non-special 
points,  and  F  those  between  special  and  non-special  points.  Note  in  the  construction 
above  that  for  1  <  i,  j  <  N  -  3,  ezj  is  non-zero  if  and  only  if  Vi  is  strongly  contiguous 
to  ^.  Furthermore,  note  that  E  is  necessarily  column  diagonally  dominant,  with  strict 
dominance  in  column  j  whenever  Vj  is  strongly  contiguous  to  the  Voronoi  region  of  a 
special  point,  or  to  Wc. 

Relabelling  if  necessary  we  can  assume  the  indices  of  the  Voronoi  regions  in  each  of  the 
equivalence  classes  Qi  form  a  contiguous  subset  of  {1, . . . ,  N  —  3}.  Similarly,  we  can  also 
assume  that  the  indices  corresponding  to  any  Qt  precede  those  corresponding  to  Qi+ 
Furthermore,  by  construction  if  i  ^  j  none  of  the  regions  in  Qi  is  strongly  contiguous  with 
a  region  in  Qj.  Thus,  corresponding  entries  in  the  matrix  E  constructed  using  boundary 
over  distance  weights  and  artificial  points  are  zero.  That  is  E  is  block  diagonal  with 
the  square  matrix  Eu  on  the  main  diagonal  corresponding  to  the  equivalence  class  of 
Voronoi  regions  Qi.  More  precisely,  Q  will  have  form 


En 

0 

...  0  ' 

O 

E22 

...  0 

O 

0 

•  *  *  E^k 

.  Pi 

f2 

Fk  _ 

3  Properties  of  the  matrix  Q 

In  this  section  we  establish  the  fundamental  properties  of  the  matrix  Q  of  (2.7).  Namely 
that  it  is  of  full  rank  and  that  its  columns  are  orthogonal  to  those  of  P. 
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Definition  3.1  For  m  >  2,  an  m  x  m  matrix  K  is  irreducible  if  there  does  not  exist 
an  m  x  m  permutation  matrix  P  such  that 

pkpt  =  [  Mnn  -J{12 1, 

0  M22  J 

where  Mn  is  r  x  r,  M22  is  (m  —  r)  x  (m  —  r),  and  1  <r  <m. 

The  following  result  is  well  known,  see  for  example  Varga  [6], 

Theorem  3.2  Suppose  the  square  matrix  K  is  irreducible  and  row  (column)  diagonally 
dominant  with  strict  row  (column)  diagonal  dominance  in  at  least  one  row  (column). 
Then  K  is  invertible. 

Lemma  3.3  Let  X  be  a  finite  set  of  distinct  points  unisolvent  for  tt2.  Let  Eu  be  one  of 
the  square  blocks  from  the  diagonal  of  Q  constructed  in  the  previous  section.  Then  Eu 
is  invertible. 

Proof:  From  the  construction  Eu  is  column  diagonally  dominant.  Furthermore,  by 
Lemma  2.3  the  diagonal  dominance  is  strict  for  at  least  one  column  of  Eu .  From  the 
definition  of  the  equivalence  relation  C- related  there  is  a  chain  of  strongly  contiguous 
pairs  of  Voronoi  regions,  connecting  any  two  Voronoi  regions  in  Qu  This  implies  the  cor¬ 
responding  entries  in  Eu  are  non-zero  and  hence  from  [6]  Theorem  1.6  Eu  is  irreducible. 
It  follows  from  Theorem  3.2  that  Eu  is  invertible.  □ 

Theorem  3.4  The  matrix  Q  described  in  Section  2  is  orthogonal  to  P  i.e.  QTP  =  O. 
Proof:  Omitted,  see  [2]  and  [7].  □ 

Theorem  3.5  Let  X  be  a  set  of  distinct  points  unisolvent  for  it2.  Let  Q  be  formed  by 
the  construction  in  Section  2  and  Aij  =  $(xi  —  Xj)  where  $  is  strictly  conditionally 
positive  definite  of  order  2.  Then  B  —  QTAQ  is  positive  definite. 

Proof:  From  Lemma  3.3  each  of  the  matrices  Eu  occurring  in  the  block  partitioning  of 
Q  given  in  Equation  (2.7)  is  invertible.  Hence  Q  has  full  rank.  Also  from  Theorem  3.4  the 
columns  of  Q  are  orthogonal  to  the  columns  of  P.  Let  p  be  any  non-zero  vector  in  1ZN~S , 
and  define  A  =  Qp.  Then  A  £  0,  PT A  =  PTQp  =  0,  and  pTBp  =  pTQTAQp  =  \T A\. 
Hence,  by  the  definition  of  strictly  conditionally  positive  definite,  pT Bp  >  0  whenever 
p  7^0  and  B  is  symmetric  positive  definite.  n 

Theorem  3.6  Let  $  be  strictly  conditionally  positive  definite  of  order  2  and  such  that 
$(hx,  hy)  =  /i74>(x,y)  +ph(x  —  y),  h  >  0  with  €  7r 3.  The  preconditioned  matrix  Bh, 
which  corresponds  to  preconditioning  on  the  point  set  hX,  is  a  homogeneous  function 
of  scale.  Thus  its  condition  number  and  the  relative  clustering  of  its  eigenvalues  are  the 
same  over  all  scales. 


Proof:  Omitted,  see  [7]. .  □ 

Theorem  3.6  applies  in  particular  to  the  usual  thin-plate  spline,  $(•)  =  |  •  |2  log  |  •  |,  in 

n2. 

The  extended  version  of  this  paper  [7]  contains  a  proof  that  the  elements  Bij  decay 
like  | Xi  —  Xj \~K  when  | Xi  —  Xj\  is  large.  For  the  multiquadric  k  is  three  and  for  the 
thin-plate  spline  k  is  two. 
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Definition  3.7  The  preconditioned  matrix  S  is  obtained  from  B  by  pre-multiplying  and 
post-multiplying  B  by  the  diagonal  matrix  D  with  ii  entry  l/y/bfi. 

4  Numerical  results 

In  this  section  we  present  numerical  results  for  the  thin-plate  spline  and  multiquadric 
basic  functions.  In  the  following  tables  the  matrix  A$  is  defined  in  (1.4),  B  in  (2.1), 

5  in  Definition  3.7  and  the  homogeneous  matrix,  C,  is  presented  in  [1].  In  Table  1  we 
show  2-norm  condition  numbers  of  matrices  for  the  various  preconditioning  techniques 
over  seven  different  scales.  It  is  clear  that  the  algorithm  in  Section  2  gives  a  matrix 
which  dramatically  improves  the  conditioning  of  the  interpolation  problem.  In  one  case 
by  a  factor  of  1014!  Tables  2  and  3  contain  condition  numbers  of  the  matrices  resulting 
from  applying  the  preconditioning  techniques  of  this  paper  for  the  thin-plate  spline  and 
multiquadric  basic  functions.  For  AT  <  3200,  the  entries  in  the  tables  are  the  maximum 
over  one  hundred  random  point  sets  of  size  N.  For  N  =  3200,  the  tables  contain  the 
maximum  over  twenty  random  point  sets  of  size  3200.  In  all  cases  the  preconditioning 
results  in  a  smaller  condition  number.  For  these  basic  functions  the  maximum  observed 
condition  number  of  the  scaled  preconditioned  matrix,  5,  grows  very  slowly  with  N. 
Certainly  there  is  no  numerical  evidence  of  power  growth  with  N. 


Scale  parameter 
a 

Conventional 
matrix  A$ 

Homogeneous 
matrix  C 

Preconditioned 
matrix  B 

Scaled 
matrix  S 

0.001 

1.531(11) 

1.534(5) 

4.905(1) 

2.405(1) 

0.01 

1.544(9) 

1.534(5) 

4.905(1) 

2.405(1) 

0.1 

1.597(7) 

1.534(5) 

4.905(1) 

2.405(1) 

1 

3.107(5) 

1.534(5) 

4.905(1) 

2.405(1) 

10 

1.915(6) 

1.534(5) 

4.905(1) 

2.405(1) 

100 

1.271(11) 

1.534(5) 

4.905(1) 

2.405(1) 

1000 

4.006(15) 

1.534(5) 

4.905(1) 

2.405(1) 

Tab.  1.  Condition  numbers  for  one  hundred  points  in  [0,  a]2  and  the  thin-plate 
spline.  The  point  set  for  scale  a  is  Xa  =  aX\. 


In  an  attempt  to  rule  out  the  possibility  that  our  numerical  results  were  flukes  due  to 
the  small  number  of  100  experiments  we  also  conducted  50,000  trials  with  random  data 
sets  of  size  100.  The  results  of  these  trials  are  shown  in  Figure  1.  The  maximum  condition 
number,  over  all  trials  with  the  thin-plate  spline,  for  the  matrix  A $  was  1.2465(9),  for 
matrix  C,  1.5750(9)  and  for  matrix  5,  1.8066(2).  In  our  experiments  the  matrix  S  is 
always  well  conditioned.  This  held  even  for  geometries  of  centres  for  which  the  matrix 
A<p  is  very  badly  conditioned. 

To  test  further  the  behaviour  of  S  for  “bad”  configurations  of  points  a  similar  exper¬ 
iment  was  run  with  one  thousand  trials  of  one  hundred  points  almost  on  a  circle.  The 
maximum  condition  numbers  of  the  A  matrix,  C  matrix  and  S  matrix  were  1.2885(9), 
7.2692(8)  and  6.6005(2)  respectively  over  1000  trials.  Even  though  the  Voronoi  regions 
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Number  of 
data  points 

Conventional 
matrix  A $ 

Homogeneous 
matrix  C 

Preconditioned 
matrix  B 

Scaled 
matrix  S 

200 

6.555(7) 

3.068(7) 

1.617(3) 

6.028(1) 

400 

5.675(8) 

3.397(8) 

1.945(3) 

8.946(1) 

800 

1.960(10) 

1.348(10) 

2.034(3) 

9.775(1) 

1600 

1.092(10) 

8.413(9) 

8.099(3) 

1.258(2) 

3200 

4.997(10) 

3.783(10) 

1.261(4) 

1.569(2) 

Tab.  2.  Maximum  condition  numbers  encountered  over  a  sample  of  100  random 
point  sets  of  size  N  in  [0,  l]2  with  the  thin-plate  spline. 


Number  of 
data  points 

Conventional 
matrix  A $ 

Preconditioned 
matrix  B 

Scaled 
matrix  S 

200 

2.014(8) 

1.532(2) 

4.224(1) 

400 

2.045(10) 

5.932(2) 

7.669(1) 

800 

6.641(10) 

4.559(2) 

5.826(1) ' 

1600 

1.554(10) 

7.025(2) 

5.601(1) 

3200 

2.477(11) 

9.362(2) 

6.280(1) 

Tab.  3.  Maximum  condition  numbers  encountered  over  a  sample  of  100  random 
point  sets  of  size  N  in  [0,  l]2  with  the  multiquadric  function,  parameter  c  =  1  /y/N. 

are  long  and  thin  the  matrix  S  is  still  well  conditioned! 
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Abstract 

Radial  basis  function  interpolation  has  an  advantage  over  other  methods  in  that  the 
interpolation  matrix  is  nonsingular  under  very  weak  conditions  on  the  location  of  the 
interpolation  points.  However,  we  show  that  point  location  can  have  a  significant  effect 
on  the  performance  of  an  approximation  in  certain  cases.  Specifically,  we  consider  multi¬ 
quadric  and  thin  plate  spline  interpolation  to  small  data  sets  where  derivative  estimates 
are  required.  Approximations  of  this  type  are  important  in  the  motion  of  unsteady  in¬ 
terfaces  in  fluid  dynamics.  For  data  points  in  the  plane,  it  is  shown  that  interpolation  to 
data  on  a  circle  can  be  related  to  the  polynomial  case.  For  scattered  data  on  the  sphere, 
a  comparison  is  made  with  the  results  of  Sloan  and  Womersley. 


1  Introduction 

Radial  basis  functions  (RBFs)  such  as  multiquadrics  or  thin  plate  splines  have  been 
successfuly  used  for  scattered  data  approximation  in  many  applications.  They  have  been 
shown  to  perform  well  for  data  fitting,  although  problems  of  ill-conditioning  and  the 
computational  cost  of  processing  large  data  sets  must  be  handled  carefully.  In  general, 
when  considering  the  accuracy  of  a  RBF  interpolant,  a  balance  must  be  achieved  between 
the  reduction  in  fill  distance  necessary  for  convergence  of  the  approximation  to  an  as¬ 
sumed  underlying  function  and  the  need  to  maximise  the  separation  distance  between 
data  points  to  avoid  problems  of  ill-conditioning  [4]. 

In  the  present  study,  we  focus  on  the  use  of  RBF  approximation  as  one  stage  of  a 
larger  algorithm  to  compute  the  evolution  of  an  unsteady  interface  in  fluid  dynamics. 
The  accuracy  of  the  approximations  made  in  the  algorithm  and  the  interaction  between 
its  different  stages  determine  whether  the  output  is  close  to  the  true  solution  of  the 
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governing  equations  or  whether  spurious  effects  are  produced.  In  the  three-dimensional 
setting,  a  typical  example  is  described  by  Zinchenko  et  al  [8]  where  the  deformation  of 
liquid  drops  in  a  viscous  medium  is  studied.  A  critical  feature  of  the  algorithm  is  the 
approximation  of  the  normal  directions  and  curvatures  of  the  droplet  surface  defined  at 
a  number  of  discrete  points. 

The  focus  here  is  algorithmic  rather  than  theoretical  and  we  investigate  the  perform¬ 
ance  of  multiquadric  and  thin  plate  spline  local  interpolants  applied  to  the  determination 
of  normal  directions  and  curvatures  of  a  smooth,  closed  surface.  Certain  configurations 
of  data  points,  such  as  points  located  on  a  circle,  impose  constraints  on  the  interpolant. 
A  framework  for  understanding  the  behaviour  of  the  RBF  interpolants  is  provided  by  a 
comparison  with  the  multivariate  polynomial  interpolant  of  de  Boor  and  Ron  [1]  and  by 
considering  the  free  parameter  in  the  multiquadric  as  a  tensioning  parameter  [2] . 

2  Approximation  method 

A  common  approach  to  solving  fluid  dynamics  problems  that  include  moving  inter¬ 
faces,  combines  a  computational  grid  with  meshless  approximation  methods.  The  gov¬ 
erning  partial  differential  equations,  or  corresponding  integral  equation  formulation,  are 
solved  on  the  grid,  while  quantities  characterising  the  interface  are  computed  as  meshless 
scattered  data  approximations. 

Here  we  examine  the  behaviour  of  local  RBF  approximations  in  the  general  context 
described  by  Zinchenko  et  al.  [8].  For  a  given  data  set,  a  particular  point  is  selected 
together  with  its  nearest  neighbours  giving  a  set  of  typically  6  or  7  points.  The  initial 
locations  of  these  points  may  be  determined  by  a  regular  mesh,  but  the  surface  is  allowed 
to  deform  so  that  the  approximation  is  essentially  to  a  small  set  of  scattered  data.  The 
constructed  RBF  interpolant,  5,  can  be  expressed  as 

N  K 

=  '^2aj4>(\\x-xj\\)  +  Y^biPi{x), 

j= 1  £=1 

with  the  constraint 

N 

^  ajPi(xj)  =  0,  for  1  <  i  <  AT, 

3=1 

where  x  €  3ft2  and  {pi  (*)}<- 1  :k  is  a  basis  for  the  space  of  bivariate  polynomials  of  degree 
<  m  — 1  with  K  =  m(m  +  l)/2.  The  chosen  forms  for  (j>  are  the  thin  plate  spline 

4>{\\x  -  Xj\\)  =  | |a:  —  ajj 1 12 log  | |as  —  ^ 1 1,  (TPS) 

and  the  multiquadric 

mx-xj\\)  =  (\\x-xj\\2  +  c2)K  (MQ) 

with  ||  •  ||  taken  to  be  the  Euclidean  norm. 

A  framework  for  interpreting  the  computed  results  in  the  context  being  considered 
can  be  derived  from  [2]  where  the  arbitrary  parameter,  c,  of  the  MQ  function  is  viewed 
as  a  tensioning  parameter.  As  c  — ►  oo  the  MQ  interpolant  approaches  the  correspond- 
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ing  polynomial  interpolant  to  the  given  data,  while  as  c  — >  0,  the  MQ  surface  is  ten¬ 
sioned.  Multivariate  polynomial  interpolation  can  fail  on  particular  point  sets  and  this 
has  provided  a  motivation  for  using  RBF  methods.  However,  the  algorithm  of  de  Boor 
and  Ron  [1]  provides  a  reliable  means  of  computing  the  ‘least’  polynomial  interpolant. 
This  algorithm  is  used  to  compute  a  polynomial  fit  as  one  reference  point  for  the  in¬ 
terpretation  of  the  MQ  interpolants.  A  second  reference  point  is  provided  by  the  TPS 
interpolant  which  gives  a  minimum  energy  surface  in  a  certain  norm.  This  is  shown  to 
correspond  closely  to  the  MQ  fit  for  a  ‘small’,  but  nonzero  value  of  c.  The  MQ  inter¬ 
polant  can  thus  be  shown  to  connect  the  minimum  energy,  tensioned,  TPS  surface  with 
the  polynomial  fit  to  given  data  as  c  increases.  In  a  fluid  dynamics  context  a  fluid-fluid 
interface  is  often  assumed  to  be  represented  by  a  C°°  function  (although  cusps  may 
occur  requiring  a  change  in  the  representation).  This  would  suggest  that  a  high  degree 
polynomial  would  be  preferred  to  a  TPS  surface. 


Fig.  1.  Interpolants  to  random  data  at  6  points  (+)  in  the  plane:  (left)  polynomial 
(upper)  and  multiquadric  (c  =  10)  (lower),  contours  [0:0. 1:2];  (right)  thin  plate  spline 
(upper)  and  multiquadric  (c  =  0.4)  (lower),  contours  [0:0. 1:1.1]. 


3  Scattered  data  in  the  plane 

To  illustrate  the  behaviour  of  local  interpolation  by  MQ  and  TPS  methods,  random 
points  in  the  xy-plane  (with  —  1  <  Xi,yi  <  1,  for  i  =  1  :  6)  are  associated  with  random 
data  values,  ft  (-1  <  fi  <  1).  Figure  1  shows,  in  the  upper  frames,  the  two  reference 
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Fig.  2.  Effect  of  varying  the  parameter  c  on  multiquadric  interpolants  to  random  data 
in  the  plane:  (left)  norms  of  the  difference  between  multiquadric  and  polynomial  in¬ 
terpolants  (upper  curve  =  ||.||oo>  lower  curve  d2  -  \\.\\2/VN)]  (right)  curvature 
(«  =  2 H)  computed  at  the  centroid:  —  multiquadric;  -  •  —  thin  plate  spline;  —  poly¬ 
nomial. 

interpolating  surfaces:  (left)  the  polynomial  surface  computed  by  the  algorithm  of  [1] 
and  (right)  the  TPS  surface.  The  lower  frames  give  the  contours  of  the  MQ  interpolants 
for  c  =  10.0  (left)  and  c  =  0.4  (right).  There  is  a  close  correspondence  between  the  upper 
and  lower  frames  on  each  side,  but  a  large  difference  between  the  polynomial  and  TPS 
surfaces. 

Figure  2  (left)  shows  the  difference  between  the  MQ  surface  and  the  polynomial 
reference  interpolant  computed  on  a  regular  grid  on  the  interior  of  the  circle  with  centre 
at  the  centroid  of  the  data  points  (0.44,  —0.09)  and  radius  the  maximum  distance  from 
the  centroid  to  a  data  point.  There  is  convergence  of  the  MQ  surface  to  the  polynomial 
as  1/c  — ►  0,  but  the  condition  number  of  the  interpolation  matrix  increases  until  the 
calculation  cannot  be  continued.  For  c  —  10.0  the  condition  number  is  3  x  107. 

As  an  indication  of  the  behaviour  of  first  and  second  partial  derivatives  of  the  inter¬ 
polating  surfaces  we  calculate  the  curvature  at  the  centroid  of  the  data  points  for  the 
polynomial  and  TPS,  together  with  MQ  as  c  varies,  using  k  —  2 H  where  H  is  the  mean 
curvature.  Figure  2  (right)  shows  that  kmq  for  the  MQ  interpolant  coincides  with  the 
value  ktps  =  —0.46  for  the  TPS  when  c  ~  0.4.  When  c  <  0.4,  kmq  <  ^tps,  while 
kmq  — 5 ►  Kp  =  -9.78,  the  polynomial  curvature,  as  c  increases. 

An  interesting  example  is  presented  in  [1]  of  polynomial  interpolation  for  points 
located  at  the  vertices  of  a  regular  hexagon 

(xuVi)=  (cos(^p) ’Sin(^p))  ,  *  =  1, ... ,6  (3.1) 

with  data  values  fi  =  (— 1)\  This  gives  the  interpolant 

p(x,y)  =  x3  -3xy2. 

Since  the  points  lie  on  the  unit  circle,  the  quadratic  polynomial 

P2{x,y)  =  l-x2  -  y 2 


(3.2) 
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vanishes  at  the  data  points  and  this  causes  difficulties  for  general  polynomial  methods. 
MQ  interpolants  do  not  suffer  from  these  difficulties.  When  c  =  10.0,  the  MQ  surface 
is  very  close  to  (3.2).  As  c  becomes  smaller,  the  MQ  surface  approaches  that  of  the 
TPS  with  the  data  values  becoming  local  maxima  or  minima  as  the  surface  is  tensioned. 
In  addition,  the  restriction  of  the  data  points  to  a  circle  implies  that  the  interpolating 
polynomial  is  harmonic,  but  the  convergence  of  the  approximation  is  only  first  order  [1]. 
The  MQ  surface  for  large  c  inherits  the  properties  of  the  polynomial  fit.  Thus,  points  on 
a  circle  are  ‘good’  if  the  data  being  interpolated  correspond  to  a  harmonic  function,  but 
‘bad’  if  the  data  describe  a  function  which  has  a  maximum  or  minimum  within  the  circle 
or  a  singularity.  These  constraints  on  the  interpolant  are  discussed  further  in  Section  5. 

4  Scattered  data  on  the  sphere 

In  this  section  we  examine  the  accuracy  obtained  from  three  separate  methods  for  in¬ 
terpolating  scattered  data  on  the  unit  sphere  S 2  C  9ft3.  In  particular  we  compare  the 
results  obtained  using  the  MQ  basis  function  in  9ft3  with  those  obtained  using  the  spher¬ 
ical  harmonics  of  Sloan  and  Womersley1  [6]  and  the  C 1  Hermite  interpolant  of  Renka  [5]. 
For  the  multiquadric  function,  we  list  the  uniform  norm  interpolation  errors  calculated 
using  a  range  of  values  for  the  shape  parameter  c. 


Fig.  3.  Minimum  energy  points  and  spherical  cap. 


The  point  distribution  used  is  the  256  ‘minimum  energy’  points  of  Fliege  and  Maier 
[3]  and  the  uniform  norm  interpolation  errors  are  calculated  at  points  distributed  on  a 
spherical  cap  (see  [5]). 

The  following  functions  are  used  for  the  comparisons  in  Table  4,  where  the  results 
presented  in  [7]  are  labelled  ‘W&S’. 

FI  =  F2  =  -5sin(l  +  lC )*), 

F3=  HxlK/10,  F4=  sin2(l +  11x110/10. 

We  note  from  Table  4  that  the  multiquadric  function  provides  consistently  better 
interpolants  to  the  four  test  functions  compared  with  the  spherical  harmonics.  Here,  the 

1Uniform  norm  errors  used  for  comparison  are  approximate  only  and  were  taken  from  graphical  representations 
presented  in  Womersley  and  Sloan  [7]. 
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Method 

FI 

F2 

F3 

F4 

W&S 

2.0000e-10 

0.5000 

0.1100 

0.0500 

Renka 

0.0013 

0.1951 

0.0054 

0.0055 

MQ  c  =  0.01 

6.0128e-04 

0.3276 

0.0051 

0.0051 

MQ  c  =  1 

4.5807e-10 

0.0175 

0.0076 

0.0062 

MQ  c  =  2 

2.2615e-13 

0.0227 

0.0079 

0.0065 

Tab.  1.  Comparison  of  uniform  norm  errors. 


points  have  been  chosen  to  minimise  the  interpolation  errors  for  the  harmonic  functions, 
yet  we  see  from  results  given  in  [7]  that  increasing  the  number  of  points  in  the  distribu¬ 
tion  (which  also  increases  the  degree  of  the  interpolating  function)  does  not  necessarily 
produce  better  accuracy.  However,  these  point  distributions  when  used  for  the  multi¬ 
quadric  function  provide  consistently  better  accuracy.  Further  evidence  suggests  that 
points  considered  optimal  for  the  spherical  harmonics  are  also  ‘good’  for  the  multiquad¬ 
ric  function  when  compared  to  an  equal  number  of  generally  scattered  points.  However, 
this  is  due  to  the  uniformity  of  the  point  distributions  and  similar  results  can  be  obtained 
on  a  refined  icosahedral  mesh. 


Method 

12  pts 

92  pts 

362  pts 

Renka 

0.1730 

0.0103 

0.8230e-03 

MQ  c  =  0.01 

0.2596 

0.0170 

0.0020 

MQ  c  =  1 

0.0715 

7.7662e-05 

1.9678e-10 

MQ  c  =  2 

0.0442 

3.8206e-05 

3.4113e-ll 

Tab.  2.  Multiquadric  vs  Renka  for  f(x,  y,  z )  =  sm(x  +  y)  +  sin(z£)  . 

The  Renka  algorithm  produced  similar  results  to  those  obtained  using  the  multiquad¬ 
ric  (for  small  c)  for  the  F3  and  F4  functions,  although  the  results  for  the  functions  FI  and 
F2  were  poor.  Further  comparisons  with  the  Renka  algorithm  have  been  made  using  12, 
92  and  362  icosahedral  points  to  interpolate  the  function  f(x,  y,  z)  =  sin(:r-|-?/)-bsin(:r£). 
The  uniform  norm  interpolation  errors  have  been  calculated  on  the  previously  mentioned 
spherical  cap.  Again  we  see  that  the  multiquadric  function  produces  better  accuracy  than 
the  Renka  method  when  the  number  of  interpolation  points  is  increased. 

5  Evolution  of  a  smooth  closed  surface 

In  this  section  we  return  to  the  local  interpolation  scheme  of  §3  and  apply  it  to  scattered 
data  on  a  smooth  closed  surface.  This  is  the  setting  described  in  [8],  where  initially  the 
interface  is  spherical  with  the  point  locations  determined  by  subdivision  of  an  icosahedral 
mesh.  Each  set  of  points  consists  of  a  central  point  together  its  nearest  neighbours, 
giving  sets  of  6  points  associated  with  the  12  vertices  of  the  icosahedron  and  sets  of 
7  points  otherwise.  The  local  method  of  Renka  [5]  is  followed  and,  for  a  chosen  point, 
a  local  coordinate  system  is  defined  with  this  point  on  the  z-axis.  The  local  point  set 
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is  projected  onto  the  xy~ plane  and  the  surface  heights  provide  the  data  values.  This 
typically  gives  a  configuration  very  close  to  the  hexagon  points  (3.1)  with  an  additional 
point  at  the  centre,  except  for  those  points  associated  with  the  icosahedron  vertices  where 
the  arrangement  is  a  pentagon.  As  the  iscosahedral  mesh  is  refined  these  configurations 
become  less  regular. 

The  addition  of  the  central  point  to  the  hexagon  points  increases  the  order  of  the 
approximation.  When  the  surface  is  spherical,  the  symmetry  of  the  data  ensures  that  the 
computed  unit  normal  at  the  centre  point  for  polynomial,  MQ  or  TPS  is  exact  except  for 
rounding  error  ( e.g .  for  MQ  the  error  is  \\n - umq\\2  =  3  x  10“14).  However,  taking  MQ 
with  c  =  10  and  a  sphere  of  radius  9,  if  the  central  point  is  displaced  from  the  origin  to 
(0.01,0.01)  the  error  in  the  normal  is  3  x  10~3.  To  illustrate  convergence  for  an  irregular 
point  set,  the  hexagon  points  are  perturbed  by  the  addition  of  a  factor  (i  -  l)e/i[l,  1]T 
for  points  i  =  1,2 ...  ,6  with  h  the  radius  of  the  circumcircle  and  taking  e  =  0.05.  For 
MQ  with  c  ~  10,  the  error  in  the  surface  normal  is  0(h 3)  whereas,  for  c  —  0.4,  the  error 
is  larger  and  the  rate  of  convergence  varies  (see  Table  3). 


h 

\\n-nMQ\\2,  c=  10 

\\n  “  kmq lb)  c  =  0-4 

1.0 

0.5 

0.1 

0.05 

3.15  x  lO-5 

3.85  x  10-6 

3.06  x  10~8 

4.99  x  10~9 

6.14  x  10~3 

1.26  x  10"3 

6.35  x  10~6 

8.47  x  HT7 

Tab.  3.  Error  in  MQ  approximation  to  surface  normal  of  sphere,  irregular  point 
set. 

Accurate  curvature  values  are  essential  for  an  interface  which  is  driven  by  surface 
tension.  The  exact  value  of  k  =  —2/9  for  a  sphere  of  radius  9,  together  with  the  computed 
values,  are  shown  in  Table  5.  The  polynomial  and  MQ  with  c—  10  are  close  to  the  exact 
value. 


Method 

K 

exact 

polynomial 
MQ  c  =  0.1 
MQ  c  =  10.0 

-0.222. . . 
-0.222912 
-1.638002 
-0.225387 

Tab.  4.  Curvature,  k  =  2 H,  evaluated  at  the  central  point  of  a  regular  hexagon. 

It  is  found  that,  for  the  icosahedral  mesh  with  N  =  362,  the  local  point  sets  are 
sufficiently  regular  to  give  good  accuracy  for  surface  normals  and  curvature  using  MQ 
interpolants  when  c  is  chosen  to  be  ‘large’  in  relation  to  the  point  spacing.  This  mesh  also 
gives  a  corresponding  accuracy  for  the  discretised  integral  equation.  These  points  can 
thus  be  considered  ‘good’  for  the  MQ  approximation.  However,  if  the  mesh  is  further 
refined  or  the  surface  deforms  during  its  evolution,  then  the  approximation  becomes 
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‘less  good’  as  the  regularity  of  the  point  locations  is  lost.  Numerical  experiments  suggest 
second  order  convergence  with  point  separation  for  irregular  local  point  sets. 

6  Conclusions 

The  behaviour  of  MQ  and  TPS  interpolants  can  be  interpreted  by  reference  to  the 
corresponding  ‘least’  polynomial  interpolant,  with  the  MQ  connecting  the  polynomial 
C°°  surface  to  the  tensioned  surface  of  the  TPS  as  the  parameter  c  decreases.  The  MQ 
interpolant  with  ‘large’  c  (relative  to  the  point  separation)  exhibits  the  properties  of  the 
polynomial  case  and  is  similarly  affected  by  the  location  of  data  points.  Thus,  points 
on  a  circle  in  the  plane  can  be  ‘good’  if  the  function  to  be  represented  is  harmonic, 
but  in  general  give  only  first  order  convergence  on  the  interior.  For  data  on  the  sphere, 
‘good’  points  for  polynomial  interpolation  are  also  good  for  the  MQ  with  ‘large’  c,  but 
other  near  equispaced  point  distributions  appear  to  give  similar  accuracy  with  MQ. 
The  tensioning  effect  of  smaller  values  of  c  can  improve  the  results  if  the  underlying 
function  is  not  C°° .  When  applied  to  an  evolving  interface,  starting  from  an  initially 
spherical  shape  and  a  refined  icosahedral  point  distribution,  it  is  found  that  local  MQ 
approximations  to  the  surface  derivatives  are  affected  by  the  point  locations.  This  can 
be  understood  by  reference  to  the  polynomial  interpolant  to  data  located  on  a  circle  and 
causes  an  irregularity  in  the  convergence  as  N  increases. 
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Abstract 

Experimental  data  analysis  is  an  key  activity  in  metrology,  the  science  of  measurement. 

It  involves  developing  a  mathematical  model  of  the  physical  system  in  terms  of  mathem¬ 
atical  equations  involving  parameters  that  describe  all  the  relevant  aspects  of  the  system. 
The  model  specifies  how  the  system  is  expected  to  respond  to  input  data  and  the  nature 
of  the  uncertainties  in  the  inputs.  Given  measurement  data,  estimates  of  the  model  para¬ 
meters  are  determined  by  solving  the  mathematical  equations  constructed  as  part  of 
the  model,  and  this  requires  developing  an  algorithm  (or  estimator)  to  determine  values 
for  the  parameters  that  best  explain  the  data.  In  many  cases,  the  parameter  estimates 
are  given  by  the  solution  of  a  least-squares  problem.  This  paper  discusses  how  various 
uncertainty  structures  associated  with  the  measurement  data  can  be  taken  into  consider¬ 
ation  and  describes  the  algorithms  used  to  solve  the  resulting  regression  problems.  Two 
applications  from  NPL  are  described  which  require  the  solution  of  generalised  distance 
regression  problems:  the  use  of  measurements  of  primary  standard  natural  gas  mixtures 
to  estimate  the  composition  of  a  new  natural  gas  mixture,  and  the  analysis  of  calibration 
data  to  estimate  the  effective  area  of  a  pressure  balance. 

1  Introduction 

Many  metrology  experiments  involve  determining  the  behaviour  of  a  response  variable  y 
as  a  function  of  a  set  of  independent  variables  x  —  . . .  ,£n}-  Model  building  in¬ 

volves  establishing  the  functional  relationship  between  these  quantities,  usually  involving 
a  set  of  model  parameters  a,  i.e., 

y*  =  0(x*,a), 

where  y*  and  x*  represent  exact  values  of  the  variables.  The  terms  a  parametrize  the 
range  of  possible  response  behaviour  and  the  actual  behaviour  is  specified  by  determ¬ 
ining  values  for  these  parameters  from  measurement  data.  In  practice,  measurements 
are  subject  to  error,  and  the  error  structure  must  be  taken  into  account  firstly  in  order 
to  determine  effective  methods  for  obtaining  parameter  estimates  and  secondly  in  de¬ 
termining  the  uncertainty  in  the  fitted  model  parameters.  For  a  set  of  measurement  data 
{xi,  the  data  analysis  problem  involves  the  accurate  estimation  of  the  parameters 

a,  taking  into  account  knowledge  of  the  uncertainties  in  {x*}  and/or  {$/*.},  and  typically 
leads  to  a  least-squares  problem  [4] . 

This  paper  describes  the  various  uncertainty  structures  that  arise  and  corresponding 
regressions  problems  for  determining  estimates  of  the  model  parameters.  If  the  covari- 
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ance  information  associated  with  the  measurements  is  structured  so  that  only  the  ith  set 
of  measurement  errors  are  correlated  with  each  other,  a  generalised  distance  regression 
approach  is  appropriate.  However,  some  applications  have  quite  general  correlation  struc¬ 
ture  and  a  full  Gauss-Markov  estimation  approach  is  required  to  make  efficient  use  of 
the  statistical  model  [7].  This  leads  to  a  generalised  Gauss-Markov  regression  problem  to 
take  into  account  the  errors  in  the  variables  and  the  general  correlation  structure.  While 
the  covariance  structure  may  dictate  which  solution  algorithms  are  to  be  employed,  the 
information  required  of  the  model  function  <j)  is  limited  to  the  evaluation  of  the  function 
and  its  derivatives  with  respect  to  a  and  x.  This  means  that  solution  algorithms  can 
be  based  on  a  compact  set  of  model-dependent  modules  and  a  generic  set  of  harnessing 
routines  that  link  the  models  to  general  purpose  least-squares  optimisation  software. 

The  layout  of  the  paper  is  as  follows.  In  Section  2  we  consider  the  various  error  struc¬ 
tures  and  corresponding  regression  problems.  Section  3  introduces  two  measurement 
problems  encountered  at  NPL:  the  use  of  measurements  of  primary  standard  natural 
gas  mixtures  to  estimate  the  composition  of  a  new  natural  gas  mixture;  and  the  ana¬ 
lysis  of  calibration  data  to  estimate  the  effective  area  of  a  pressure  balance.  Although 
the  functional  models  for  these  measurement  systems  are  simple,  taking  the  form  of 
low-order  polynomials,  the  statistical  models  need  to  account  for  (a)  uncertainties  in 
both  the  dependent  and  independent  variables,  and  (b)  possible  correlations  between 
measurements.  These  requirements  lead  us  to  solve  generalised  regression  problems.  An 
overview  of  solution  algorithms  for  the  various  problems  is  given  in  Section  4.  Concluding 
remarks  are  made  in  Section  5. 

2  Error  structures  and  regression  problems 

Within  metrology,  various  error  structures  arise  all  of  which  can  be  taken  into  account. 
We  now  consider  the  main  types. 

2.1  Error  in  one  variable  only 

2.1.1  Ordinary  (weighted)  least  squares 

The  simplest  type  of  error  structure  occurs  when  only  one  of  the  system  variables  is 
subject  to  error  and  there  is  no  correlation  between  errors.  The  model  is  summarised  by 

Vi  —  >a)>  Vi-Vi+f-u  Xj=xJ, 

where  it  is  assumed  that 

E{ti)  =  0,  var(£j)  =  of ,  cov(eU€j)  =  0,  i^j.  (2.1) 

Good  estimates  of  a  can  be  found  by  solving 

m 

min  -  4>(xi,a)]2, 

i- 1 
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where  Wi  =  1  /a*,  i  =  1, . . . ,  m. 
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2.1.2  Gauss-Markov  regression 

If  instead  of  (2.1),  the  measurement  errors  are  correlated  so  that 

E(e)  =  0,  var(e)  =  V, 

with  V  full  rank,  then  an  estimate  of  a  can  be  found  by  solving 

min[y  -  tf>(a)]TV~1  [y  -  <j>( a)],  (2.2) 

a 

where  the  ith.  element  of  (f>( a)  is  ^(x^a). 


2.2  Errors  in  more  than  one  variable 

In  many  metrological  applications  more  than  one  of  the  measured  variables  is  subject  to 
error,  and  this  must  be  taken  into  account  in  order  to  determine  estimates  of  the  model 
parameters  which  are  statistically  efficient  and  free  from  major  bias. 

2.2.1  Orthogonal  distance  regression 

The  simplest  case  arises  when  the  covariance  matrix  associated  with  the  ith  set  of  meas¬ 
urements  is  a  multiple  of  the  identity  matrix  and  there  is  no  correlation  between  any  of 
the  errors,  summarised  by  the  model 

Hi  =  <Hx*,a),  Vi  =  y*  +  Cj,  Xj  =  X*  +  8U 


=  0,  varfo)  =  pfl,  (2.3) 

where  rj,  =  (et,  Sj)T.  In  this  case,  appropriate  estimates  of  the  parameters  are  determ- 
ined  by  the  solution  of 

m 

min  y'w?{(xi-x*)T(xj-x*)  +  (j/i-<A(x*,a))2}, 

{x?ba 

where  Vi  =  l/pi,  i  =  1, . . .  ,m. 

Note  that  this  optimisation  problem  involves  m  sets  of  parameters  x*  as  well  as  the 
parameters  a  specifying  the  model  y  —  <£(x,  a) . 

2.2.2  Generalised  distance  regression 

If  we  assume  that  the  errors  tqi  are  correlated  with  var^)  =  V{  with  V{  full  rank,  but 
that  cov(r]iyrjj)  =  0,  i  ^  j,  then  the  appropriate  regression  problem  is 


1  r  Vi  ~  ^(xi  >  a) 


Vi  -<K*-,a) 


2.2.3  Generalised  Gauss-Markov  regression 

The  most  complicated  error  structure  arises  when  all  variables  are  subject  to  measure¬ 
ment  error  and  there  is  general  correlation  between  the  errors.  If  (,  (4*)  is  the  vector  of 
measurements  {x,}  (variables  {x*}),  then  the  corresponding  regression  problem  is 


y -<£(£,  a) 

c-r 


y  —  0(£>a' 
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where  the  ith  element  of  </>(£,  a)  is  0(x*,  a). 

3  Examples  from  metrology 

3.1  Preparation  of  primary  standard  natural  gas  mixtures 

Within  the  Centre  for  Optical  and  Analytical  Measurement  at  NPL,  one  part  of  the 
work  of  the  Environmental  Standards  Group  is  to  prepare  primary  standard  natural  gas 
mixtures.  These  are  cylinders  containing  natural  gas  prepared  gravimetrically  to  con¬ 
tain  known  compositions  of  each  of  the  11  constituent  components  (methane,  ethane, 
propane,  1-butane,  n-butane,  1-pentane,  n-pentane,  neo-pentane,  hexane,  nitrogen  and 
carbon  dioxide).  Mixtures  are  prepared  to  cover  various  concentration  ranges,  e.g.,  meth¬ 
ane:  64%  —  98%.  These  primary  standard  mixtures  are  used  as  the  basis  for  determining 
the  composition  of  a  new  mixture  and  hence  its  calorific  value. 

Given  a  number  of  primary  standard  natural  gas  mixtures  containing  known  con¬ 
centrations  of  one  of  the  constituent  components  (e.g.,  CO2),  the  detector  response  for 
each  mixture  and  the  detector  response  for  the  new  mixture,  we  wish  to  determine  the 
concentration  of  CO2  in  the  new  mixture. 

An  approach  to  solving  this  problem  is  firstly  to  use  the  calibration  data  (relating  to 
the  primary  gas  mixtures)  to  calibrate  the  detector  and,  secondly,  to  use  the  calibration 
curve  so  constructed  with  the  new  measurement  to  predict  the  concentration  in  the  new 
mixture. 

Errors  to  be  accounted  for  are: 

•  the  calibration  data  is  known  inexactly.  The  process  of  preparing  the  primary  stand¬ 
ards  means  that  they  are  known  inexactly,  and  indeed  the  errors  in  the  standards 
may  be  correlated  (this  is  a  consequence  of  the  gravimetric  process  used  to  prepare 
the  standard  mixtures  which  involves  comparing  on  a  balance  each  standard  mix¬ 
ture  at  each  stage  of  preparation  against  calibrated  masses  selected  from  a  common 
set  of  masses), 

•  the  data  returned  by  the  detector  (which  is  based  on  the  analytical  technique  of 
chromatography)  is  subject  to  measurement  error. 

Consequently,  we  wish  our  data  analysis  to  account  for  the  inexactness  of  the  meas¬ 
urement  data  and  to  quantify  the  resulting  uncertainty  associated  with  the  final  meas¬ 
urement  result. 

Figure  1  shows  a  sample  set  of  measurement  data,  with  the  ellipses  around  the  calib¬ 
ration  points  illustrating  the  errors  in  the  concentrations  and  detector  responses.  (The 
error  ellipses  have  been  magnified  greatly  for  illustrative  purposes.)  The  figure  also  shows 
a  calibration  curve  which  is  used  to  estimate  the  concentration  of  the  component  for 
which  the  detector  response  (and  its  uncertainty)  is  known. 

3.2  Calibration  of  pressure  balances 

The  principal  role  of  the  Pressure  and  Vacuum  Section  in  the  Centre  for  Mechanical  and 
Acoustical  Metrology  at  NPL  is  the  development  and  maintenance  of  primary  measure¬ 
ment  standards  for  pressure  and  vacuum  and  their  dissemination  to  industry.  Pressure 
balances  are  pressure  generators  and  consist  essentially  of  finely-machined  pistons  moun- 
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Fig.  1.  Sample  data  (+),  fitted  calibration  curve  and  predicted  measurement  (o). 


ted  vertically  in  close-fitting  cylinders.  The  pressure  required  to  support  a  piston  and 
associated  ring-weights  depends  on  the  mass  of  the  piston  and  ring-weights  and  the 
cross-sectional  area  of  the  piston  [5].  Due  to  various  fluid  dymamic  effects,  the  effective 
area  A(p,  a)  of  the  piston-cylinder  assembly  is  a  function  of  pressure,  usually  taken  to 
be  a  linear  function  A(p,a)  =  a\  +  a2P-  Many  other  factors  such  as  temperature  and  air 
buoyancy  have  to  be  taken  into  account  but  for  our  purposes  here,  the  pressure  generated 
satisfies 

a1p  +  a2p2  =  y{m), 

where  a  are  the  instrument  parameters  and  y(m)  is  a  simple  function  of  the  applied  load 
m.  This  equation  determines  p  implicitly  as  a  function  of  m  and  a.  Suppose  a  reference 
pressure  balance  has  been  calibrated  so  that  estimates  of  the  instrument  parameters 
a  and  their  uncertainties  are  known.  The  reference  balance  can  be  used  to  calibrate  a 
test  balance  in  a  cross-floating  experiment  in  the  following  way.  A  load  m*  is  applied 
to  the  reference  balance  to  generate  pressure  pi  =  p(mi,a).  A  load  n*  is  applied  to  the 
test  balance  so  that  the  pressures  generated  are  matched.  The  test  calibration  curve  is 
determined  from  a  best  fit  to  the  data  (rii,pi) 

biP*  +  b2(p*)2  =  y(n*),  Pi=p’  +  e<,  n,  =  n*  +  eu 

where  Si  and  e*  represent  measurement  error  associated  with  the  pressures  and  masses, 
respectively.  However,  the  following  must  be  taken  into  account.  Firstly,  the  pressures 
Pi  all  depend  on  the  common  estimates  a  of  the  instrument  parameters  of  the  reference 
balance,  leading  to  correlation  of  the  measurement  errors  <5*.  Secondly,  the  masses  n* 
and  rrii  are  made  up  from  the  same  ensemble  of  masses  fJL  —  (p i, . . . ,  ^a/)T  so  that 

ni  =  njn,  = 
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where  n*  and  m;  are  binary  coefficient  vectors.  This  means  that  measurement  errors 
associated  with  the  masses  pk  give  rise  to  (further)  correlation  between  Si  and  e*.  Taking 
this  general  correlation  into  account,  estimates  of  the  the  instrument  parameters  b,  are 


found  from  solving 


min 

b,P* 


y -<£ 

p-p* 


y-"4> 

p-p*  J’ 


(3.1) 


where  the  ith  elements  of  <f>  and  y  are  bxp*  +  b2{p*)2  and  y(rn ),  respectively,  and  V  is 
the  appropriate  covariance  matrix  determined  from  the  dependence  of  y  and  (j>  on  a  and 
/x.  This  is  a  generalised  Gauss-Markov  regression  problem. 


4  Algorithms  for  generalised  regression 

Algorithms  for  ordinary  least  squares  problems  of  the  form  mina  ■  /2  (a)  are  well  known 
and  include  QR  factorisation  methods  for  linear  models  or  the  Gauss-Newton  algorithm 
for  non-linear  models;  see,  e.g.,  [2,  6].  The  latter  algorithm  requires  the  user  to  supply  a 
software  module  to  evaluate  the  vector  of  function  values  f(a)  and  the  Jacobian  matrix 
J  of  partial  derivatives 

7  _M 
13  da/ 

If  /;( a)  =  yi  —  ^(xj,  a)  as  considered  above,  the  user  has  to  supply  a  module  to  calculate 
</>(x,  a)  and  d(f>/daj. 

If  V  is  symmetric  and  strictly  positive  definite,  the  Gauss-Markov  regression  problem 
(2.2)  can  be  formulated  as  an  ordinary  least  squares  problem.  If  V  —  LLT  where  L  is 
lower-triangular,  then  the  problem  becomes 

min/?(a), 

a 


where  f  =  L-1f.  The  associated  Jacobian  matrix  is  J  =  L-1J.  If  the  matrix  V  is 
well-conditioned,  matrix  operations  with  V  or  L~l  should  not  lead  to  unnecessary  loss 
of  precision.  However,  explicit  calculations  involving  V  can  be  avoided  by  using  the 
generalised  QR  factorisation  [2,  8,  9],  leading  to  solution  algorithms  with  good  numerical 
properties. 

The  generalised  distance  regression  problem  (2.4)  can  be  solved  efficiently  by  making 
use  of  the  fact  that  the  parameters  x*  appear  only  in  the  ^th  summand.  The  associated 
Jacobian  matrix  has  a  block-angular  structure  that  can  be  exploited  effectively  in  the 
QR  factorisation  stage  [2,  3].  Alternatively,  a  separation-of- variables  approach  can  be 
adopted  in  which  the  parameters  x*  (a)  are  first  determined  as  functions  of  a  specified 
by  the  solution  of  the  corresponding  footpoint  problem 


min 

•v* 


Jfc-0(xj, a)  }T  V-1 
x*-x?  J  < 


y<-0(Xi,  a)  ’ 

Xi-x* 


and  the  problem  formulated  as  a  non-linear  least  squares  problem  in  a  [1,  4],  Either  ap¬ 
proach  yields  an  algorithm  requiring  0(r7m2)  flops  while  a  full  matrix  approach  requires 
0(m3)  flops. 


The  generalised  Gauss  Markov  problem  (2.5)  can  be  solved  as  a  Gauss-Markov  prob¬ 
lem  problem  in  the  variables  {x*}  and  a,  but  ideally,  we  would  like  to  develop  algorithms 
that  exploit  problem  structure  as  in  generalised  distance  regression  algorithms.  In  partic¬ 
ular,  while  the  covariance  matrix  V  may  well  be  full,  in  many  situations  it  is  constructed 
from  smaller  matrices  and  for  which  more  efficient  algorithms  could  be  developed. 

From  the  user’s  point  of  view,  all  the  regression  algorithms  discussed  here  require  only 
the  calculation  of  the  model  function  <j>  and  its  derivatives  ^  and  Thus,  a  wide 
range  of  regression  problems  can  be  solved  using  standard  optimisation  modules  along 
with  generic  harness  modules  that  perform  the  conversion  without  input  from  the  user 
over  and  above  the  calculation  of  <j>  and  its  derivatives.  For  example,  we  have  implemented 
a  generalised  Gauss-Markov  solver  to  solve  problems  such  as  (3.1)  for  any  explicit  model 
y  —  cj)(x,  a).  However,  issues  of  efficiency  and  numerical  stability  need  to  be  taken  into 
account.  As  part  of  the  UK  Department  of  Trade  and  Industry’s  Software  Support  for 
Metrology  programme,  NPL  is  developing  and  making  available  to  metrologists  a  suite  of 
routines  for  the  generalised  regression  problems  discussed  above.  By  combining  structure 
exploiting  linear  algebra  and  numerically  stable  components  such  as  the  orthogonal 
factorisation,  it  is  hoped  that  metrologists  will  be  able  to  use  these  routines  with  the 
same  confidence  and  effectiveness  that  they  currently  experience  with  standard,  well- 
engineered  regression  modules  available  in  numerical  libraries. 

5  Concluding  remarks 

In  metrology,  we  are  interested  in  the  determination  of  accurate  estimates  of  the  paramet¬ 
ers  that  describe  a  physical  process.  It  is  imperative  that  knowledge  of  the  measurement 
system  should  be  used  to  describe  the  error  structure  as  accurately  as  possible.  We  have 
described  the  five  types  of  regression  problems  that  can  occur  in  metrology  depending 
on  the  error  structures  that  are  assumed.  In  all  cases  it  is  important  that  we  employ 
efficient,  numerically  stable  algorithms  and  exploit  any  structure  in  both  the  Jacobian 
and  covariance  matrices. 
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Abstract 

We  consider  the  problem  of  nonparametric  regression.  The  aim  is  to  get  a  smooth  function 
which  represents  the  dataset  and  has  a  reasonable  number  of  extreme  values.  An  iterative 
method,  the  QSOR  method  is  introduced.  Problems  with  the  slow  convergence  of  the 
method  are  reduced  using  multigrid  techniques. 


1  Introduction 

Given  a  dataset  {y(ti),i  =  1, . . .  ,  n}  which  we  denote  by  y ,  we  look  for  a  decomposition 

yifi)  ”  fifi)  r(ti),  (tj  =  ifn ,  i  =  1, . . . ,  n) 

where  /  is  a  simple  function  and  the  {r(^),  ( i  —  1,. . .  ,n)}  are  the  resulting  residuals 
which  approximate  white  noise.  We  use  two  different  concepts  of  simplicity.  The  first 
is  the  number  of  local  extreme  values.  The  second  is  the  smoothness  of  the  function  as 
measured  by  the  standard  smoothness  functional 

S(f)  :=  [\f™(t))2dt, 

Jo 

where  / ^  is  the  second  derivative  of  /.  The  number  of  local  extremes  is  taken  to  have  pri¬ 
ority  over  smoothness.  The  number  of  local  extremes  and  their  locations  are  determined 
by  the  taut  string  method  developed  in  [3].  This  is  described  briefly  in  the  next  section. 
The  residuals  are  required  to  look  like  white  noise  in  the  sense  that  the  means  over  certain 
dyadic  intervals  are  required  to  lie  within  given  bounds  [3].  The  multiresolution  coef¬ 
ficients  for  (n  =  2V)  are  defined  by:  Wij  :=  2(~*/2>  (*  =  0  (j  = 

0, ...,2^"^  -  1).  The  multiresolution  condition  now  requires  that  -cn  <  Wij  <  cn, 
where  cn  represents  some  form  of  thresholding.  The  default  value  of  cn  which  we  use  is 
cn  =  ff„^2.51og(n)  where  cr„  =  1.482  •  median(|j/2  -  Vi\,  ■  ■  ■  ,  \yn  ~  Vn-i\)/\/2- 
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Fig.  1.  The  top-left  caption  shows  the  original  doppler  function  and  the  top-right 
caption  shows  the  noisy  version.  The  bottom-left  caption  shows  the  result  of  the  taut 
string  algorithm  with  the  resulting  residuals  being  shown  in  the  bottom-right  caption. 

2  Taut  string 

A  short  description  of  the  taut  string  method  is  as  follows.  We  write  /  =  (/i,  •  •  • ,  fn)T  '•= 
(/(ii), . . . ,  f(tn))T  €  Rn  and  denote  the  cumulative  sums  of  y  and  f  by  Y  and  F 
respectively,  Y{  =  Y?j=i  Vh  =  SJ=i  fj>  (i  =  0, . . .  ,n),  with  Y0  =  F0  =  0.  We  specify 
bounds  defined  by  A  =  (Ai, . . . ,  An)T  £  RJ  and  consider  the  tube 

{G:|F-C|<A}.  (2.1) 

The  taut  string  V(A)  is  now  the  function  defined  by  a  taut  string  attatched  to  the 
points  (0,yo)  and  (n,Fn)  and  constrained  to  lie  within  the  tube  (2.1).  It  can  be  shown 
that  the  taut  string  minimizes  the  number  of  extreme  values  of  the  functions  g  whose 
cumulative  sums  G  lie  within  the  tube.  The  taut  string  is  continuous  and  piecewise 
linear.  Its  derivative  u(A)  is  taken  as  a  candidate  regression  functions.  The  vector  A  is 
determined  in  a  data  dependent  manner  by  the  requirement  that  the  residuals  associated 
with  t?(A)  {r(A)j  =  yi  —  v(X )j,z  =  l,...,n}  satisfy  the  multiresolution  condition.  If 
such  a  condition  fails  on  an  interval  then  the  A-values  associated  with  that  interval 
are  reduced  in  size.  An  application  of  the  taut  string  method  to  the  doppler  data  of 
Donoho  and  Johnstone  (see  e.g.  [4])  is  shown  in  Figure  1.  The  function  is  defined  by 

f(t)  —  21y/(v(i  -  t))  sin  (2tt  *+qq|)  •  The  derivative  v(A)  is  piecewise  constant  as  may 

be  seen  from  Figure  1.  The  function  u(A)  determines  the  number  of  local  extremes.  We 
take  the  midpoints  of  the  intervals  associated  with  a  local  extremes  as  the  locations  of 
the  local  extremes  for  the  smoothing  algorithm. 
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3  The  smoothing  problem 

We  make  the  smoothing  problem  precise  a s  follows.  The  number,  locations  and  type  of 
extreme  values  are  taken  from  the  taut  string  as  explained  in  the  last  section.  We  further 
require  the  function  /  to  lie  in  the  tube  determined  by  the  taut  string.  This  is  to  prevent 
the  smoothing  procedure  from  moving  too  far  from  the  data.  These  restrictions  may  be 
described  in  the  form 

Af  >b  (3.1) 

for  an  appropriate  matrix  A  and  vector  b.  This  leads  to  the  following  problem: 
minimize  £?«i(/i+i  “  2/i  +  fi-i)2  subject  to  (3.1), 

or  equivalently 

minimize  FTQ%F  subject  to  (3.1), 

for  some  quadratic  form  Q3.  We  denote  this  latter  quadratic  programming  problem  by 
QP3.  Clearly  the  matrix  associated  with  the  quadratic  form  -  2 /*  4-  fi-\)2 

is  singular.  Nevertheless  the  solution  of  QP3  may  be  unique.  We  have  the  following 
theorem. 

Theorem  3.1  Let  V(X)  be  the  result  of  the  taut  string  method.  Assume  that  V(A)  has 
one  extreme  value.  We  define  the  bounds  L,  U  by 


L:=Y  -\,U  :=Y  +  A. 

Let  Fi ,  F2  be  two  solutions  of  the  corresponding  quadratic  program.  Additionally  let  F\ 
touch  three  bounds  alternately 


Then 


(i.e.  UiiyLi2,Ui3  or  Li^Ui^Li^^i  <i2  <h)  are  active). 
Fi  =  F2. 


We  call  a  problem  with  a  unique  solution  a  nondegenerate  problem.  From  now  on  we 
assume  that  our  problem  is  nondegenerate. 


3.1  Quadratic  programming 

There  are  many  algorithms  which  solve  quadratic  programming  problems  directly.  Un¬ 
fortunately  most  of  them  are  expensive  in  terms  of  memory  requirements  and  are  not 
feasible  for  data  sets  of  the  order  say  n  =  8196.  To  overcome  this  we  look  for  iterative 
methods  which  converge  to  the  solution.  Gradient  projection  methods  (e.g.  as  defined 
in  [8],  [2]  or  [9])  are  not  appropriate  for  this  purpose  as  the  monotonicity  constraints 
make  the  projection  into  the  feasible  set  too  expensive.  Instead  we  use  a  modified  ver¬ 
sion  of  the  QSOR  (quasi  successive  over  relaxation)  method  developed  by  Metzner  in 
[7].  QSOR  is  a  very  cheap  iteration  and  converges  to  the  solution  of  QP3.  Unfortunately 
the  convergence  is  very  slow  on  sections  where  the  solution  is  smooth.  To  overcome  this 
we  use  multigrid  methods  which  have  to  be  adapted  to  our  requirements. 
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4  QSOR 

The  QSOR  algorithm  is  an  iterative  method  which  produces  a  feasible  sequence  {Fk}^L 0 
converging  towards  the  solution  of  QP3.  For  simplicity,  we  describe  the  iteration  only  for 
a  convexity  interval.  Let  F°  G  Rn  be  an  arbitrary  feasible  vector.  The  obvious  candidate 
is  the  derivative  of  the  taut  string.  Let  Q  =  Q$  and  uj  G  (0, 2).  The  following  defines  a 
QSOR  iteration. 

•  While  convergence  not  achieved 

•  F  =  Fk 

i  =  1 

Fi  —  Fi~~  ^7 (Qf)i,Li  =  rnax{2Fm  - F*+2 ='  UuFi  =  med {LuUuFi} 

1  =  2 

Fi  =  Fi-  (Qz)ihLi  =  max{2 Fi+i  -  Fi+2,  U} 

Ui  =  m\n{(Fi+1+Fi--i)/2,Ui},Fi  =  med{Li,  F*} 

•  for  (i  in  3:(n-2)){ 

Fi  =  Fi-  ^7 {Qz)i,  U  =  max{2Fm  -  Fi+2, -  F*_2,  L<} 

Ui  =  min{(ii+i  +  F^_i)/2,  C/*},  F*  =  med{Li;  Uu  Fi} 
if  (2  active)  mark  i 

} 

} 

2  =  n 

Fi  —  F2  —  zfc(Qz)iuLi  =  max{2Fj_i  -  Fi_2,Li},Fi  ==  Fi,F*  =  med{Li, 0i, F*} 
i  =  1 

^  =  max{2Fi+i  -  Fi+2,Li} 

=  C/iLi  =med{Li,Fi,Fi} 

•  correct  the  active  intervals: 

*  Let  [Fv,  Fu+k\  be  an  active  Interval:  Fi  =  Fv  -f  (Fu+k  —  Fu).  Denoting 

the  Fth  unit  vector  in  lRn  by  e*  and  a ,  6  defined  by 

b 

yv+kyu+k  Q  >  yV+k  yV+k 

/  -‘‘i — ]>  Vv*7  jL^i= v  %  j^vij 

set  Fk  :=  Fj  —  a(atj  +  6)  with 

•  Fk  =  Fi  for  all  i  in  other  intervals 

Theorem  4.1  (convergence)  Let  (Ffc)£T0  be  the  sequence  in  W1  produced  by  the  QSOR 
algorithm  and  let  the  problem  QP3  be  nondegenerate.  Then 

•  {Fk)fc_0  converges  in  Mn. 

•  F*  :=  lim  Fk  is  the  solution  of  QP3. 

k—*oo 
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Fig.  2.  The  captions  top-left,  top-right,  bottom- left,  bottom-right  show  the  result  of 
the  QSOR  iteration  for  the  doppler  data  (n  =  2048)  after  1000, 5000, 10000, 20000  steps 
respectively. 


The  slowness  of  the  convergence  can  be  seen  by  the  fact  that  the  doppler  data  of  Figure  1 
required  two  million  iterations  before  a  satisfactory  solution  was  obtained.  This  is  shown 
in  Figure  2.  After  a  small  number  of  iterations  the  solution  does  not  change  any  more 
on  the  “left  side”  where  the  function  oscilates  rapidly.  After  1000  iterations  of  QSOR 
(which  is  fast  because  one  QSOR  step  is  very  cheap!)  the  solution  looks  very  smooth 
except  for  a  few  “buckles”  on  the  “right  side”  of  the  solution.  The  method  needs  many 
iterations  (up  to  two  million)  to  reach  an  adequate  smoothness.  The  slowness  of  the 
convergence  is  known  from  the  original  SOR  method  for  solving  linear  equations.  In  the 
standard  case  of  solving  linear  equations  multigrid  methods  can  be  used  to  speed  up  the 
rate  of  convergence.  We  now  apply  this  idea  to  solving  the  problem  QP3. 


5  Multigrid  QSOR 


Multigrid  techniques  are  general  techniques  to  speed  up  iterative  methods  which  indeed 
have  other  good  properties.  The  ideas  are  given  for  example  in  [1]  or  [5].  We  will  give  here 
a  short  description  of  the  multgrid  idea  in  our  case.  First  some  notation.  Given  a  grid 
G  =  Gf  =  (ti ,...,£«)  we  define  the  coarse  grid  Gc  =  (*i, *„), *i  =  Mm  = 
n  with  ij  e  {l,...,n}.  We  define  the  projection  down  Icx  —  (Fi,Fj2, . . . ,Fim_t,Fn)T 
and  the  projection  up  Icx  =  y  where  yi  =  Fi  (l  6  {ij\j  =  l,...,m}),  and  by  linear 
interpolation  elsewhere,  i.e., 


yi 


fij-i 

tij  “ 


( ij — i  <C  /  <C  ij )« 


We  define  now  the  multigrid  QSOR  with  only  two  grids,  i.e.,  of  level  two.  The  general 
case  of  level  v  €  N  is  defined  similarly.  Let  QSOR{G,  A,  b,  p,x)  denote  the  result  of  the 
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Fig.  3.  The  left  figure  shows  the  result  of  QSOR  after  5000  iterations.  The  right  figure 
shows  the  result  of  (1000)  multigrid  QSOR  with  one  coarsing  step  (i.e.  the  right  figure 
is  “cheaper”  than  2000  QSOR  streps). 

QSOR  method  applied  to  the  problem  on  the  grid  G  after  /x  iterations  on  the  Grid  G 
with  starting  vector  x  and  constraints  defined  by  A,  6.  Additionally  let  Fk  be  given. 

•  Multigrid  QSOR 

*  while  precision  not  achieved 
o  F  =  QSOR(G,A,b,fi,Fk) 
o  F  -  IcQSOR(Gc ,  Ac,  bc,  /x,  IeF) 
o  Fk+1  =QSOR{G,A,b,ii,F) 

where  Ac,  bc  are  the  corresponding  constraints  for  the  coarser  grid.  The  question  is  now 
how  to  define  the  projection  of  the  constraints.  One  can  think  of  an  example  where 
the  canonical  projection  of  bounds  like  Gc  can  fail.  This  happens  for  example  if  strong 
constraints  (e.g.  tight  bounds)  are  not  on  the  coarse  grid.  To  overcome  this  problem  one 
has  to  think  of  a  method  of  defining  the  problem  QP3  on  the  coarser  grid  in  a  reasonable 
way.  One  way  to  handle  this  problem  is  to  define  L *.  m&x{Lk\ij-i  <  k  <  ij+i}  and 
“min”  for  the  upper  bounds.  Special  cases  have  to  be  treated  but  we  do  not  go  into 
details  here.  A  coarser  grid  means  that  the  QSOR  step  on  this  grid  converges  much 
faster.  On  the  other  hand  the  approximation  of  the  solution  gets  worse  by  coarsening 
the  grid.  In  our  case  (see  Figure  4)  we  have  n  —  2048.  The  coarsest  grid  was  made  by 
taking  every  eighth  gridpoint.  We  iterated  until  there  was  no  recognizable  improvement. 


6  Proofs 

Proof  of  Theorem  3.1:  We  set  D  —  F2  -  Fi.  One  simply  verifies  that  D  has  to  be  a 
line,  i.e.,  there  are  a,  b  £  R  such  that  Di  =  ati  +  b.  Touching  three  bounds  alternately 
means  that  D  changes  its  sign  at  least  two  times  which  leads  to  D  =  0.  □ 
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Fig.  4.  Multigrid  QSOR  applied  to  the  doppler  data  with  n  =  2048.  The  figure  took 
less  than  6  seconds  comparing  to  three  hours  without  multigrid  on  the  same  computer. 


Proof  of  Theorem  4.1:  We  set  Q  =  Qs.  We  have  to  show: 

1)  (S3(Fk))ke No  decreases; 

2)  ( Fk)keN  is  feasible; 

3)  If  Fs  is  a  stationary  point  of  QSOR,  then  Fs  minimizes  S3  in  the  feasible  set. 

•  our  feasible  set  is  compact,  so  the  sequence  has  a  convergent  subsequence, 

•  a  limit  of  a  subsequence  of  (Ffc)JL1  is  a  stationary  point  of  QSOR, 

•  the  problem  has  only  one  solution. 

To  the  first  point,  we  only  remark  that  a,  b  as  defined  in  the  algorithm,  are  the  minimizers 
of  the  term: 

(v+k  v+k  \\T  (  (  v+fc 

X  T  ti6i  +  yy2  tiCi  I  ]  Qlz-lxY'  Uei  +  yY]  t«c< 

The  others  are  treated  as  in  [6] .  The  second  point  is  clear,  because  by  the  definition  we 
start  with  a  feasible  vector  and  we  retain  the  feasibility  in  every  single  step.  It  remains 
to  show  the  third  point.  Let  Fs  be  a  stationary  point  of  the  algorithm.  It  is  sufficient  to 
show  that  (QFS,Z  —  Fs)  >  0  for  all  feasible  vectors  2  (see  [6]),  where  (,)  denotes  the 
standard  inner  product  in  Mn.  To  show  this  we  first  note  that  Q  =  DTQ2D ,  where 

/  1  \ 


V  -11/ 

and  Q2  is  the  matrix  according  to  QP3,  i.e.,  to  the  direct  problem.  So  we  can  deduce 
that  (QFS,Z  -  Fs)  =  (Z  -Fs)tQFs  =  (Z  -  F*)TDTQMDFS  =  {QMfs,z-  fs){fs  := 
DFS,  z  :=  DZ).  Now  we  only  have  to  look  at  the  “active  points”  because  (QFs)i  is 


Smooth  regression  subject  to  extreme  values 


285 


zero  everywhere  else.  Let  Z  be  an  arbitrary  feasible  vector  and  j  be  an  index  with 
Z?  =  Lj  and  ( Qz)j  ^  0,  so  —  u(QFs)  <  0.  With  the  feasibility  of  Z ,  it  follows  that 
(QFs)j(Zj  —  Z?)  =  (QFs)j(Zj  —  Lj)  >  0.  With  the  same  argument  we  can  derive 
(' QFs)j(Zj  -  Zj)  >  0  if  Fs  touches  the  upper  bound.  It  remains  to  show  the  inequality 
for  the  linearity  intervals.  Let  [tvitu+k]  be  a  linearity  interval  of  Fs.  Then  obviously 
[tv+i,tv+k]  is  a  constancy  interval  for  Fs.  Furthermore  it  follows  from  the  stationarity 
of  Fs  that  a,  6  as  defined  in  the  algorithm  are  zero.  This  is  equivalent  to 

v+k  v+k  ^  v+k 

£(Qn=  0,  X>QFs=-^iQFs  =  0  (U  =  i/n), 

.  ri 

l—V  l  —  V  l—V 

which  implies  that 

i  i  i 

J2(QX)i  =  (QMx)i,  '£i(QX)i  =  '£(QM  x)t 

i=  1  i—v  i—  1 

for  arbitrary  X  eM.n  and  x  =  DX.  So  our  conditions  are 

u-\-k  v+k 

0 QmFs)v  =  0,  =  0  ^  Y.  ( QMF+  =  0. 

i—v  i=u+l 

This  case  was  proved  by  Lowendick  [6].  □ 
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Abstract 

Given  data  on  multiple  variables  we  present  a  method  for  fitting  a  function  to  the  data 
which,  unlike  conventional  regression,  treats  all  the  variables  on  the  same  basis  i.e.  there 
is  no  distinction  between  dependent  and  independent  variables.  Moreover,  all  variables 
are  permitted  to  have  error  and  we  do  not  assume  any  information  is  available  regarding 
the  errors.  The  aim  is  to  generate  law-like  relationships  between  variables  where  the  data 
represent  quantities  arising  in  the  natural  and  social  sciences.  Such  relationships  are 
referred  to  as  structural  or  functional  models.  The  method  requires  that  a  (monotonic) 
relationship  exists;  thus,  in  the  two  variable  case  we  do  not  allow  cases  where  there  is 
zero  correlation.  Our  fitting  criterion  is  simply  the  sum  of  the  products  of  the  deviations 
in  each  dimension  and  so  corresponds  to  a  volume,  or  more  generally  a  hyper- volume. 
One  important  advantage  of  this  criterion  is  that  the  fitted  models  will  always  be  units 
(i.e.  scale)  invariant.  We  formulate  the  estimation  problem  as  a  fractional  programming 
problem.  We  demonstrate  the  method  with  a  numerical  example  in  which  we  try  and 
uncover  the  coefficients  from  a  known  data-generating  model.  The  data  used  suffers  from 
multicollinearity  and  there  is  preliminary  evidence  that  the  least  volume  method  is  much 
more  stable  against  this  problem  than  least  squares. 

1  On  the  undeserved  ubiquity  of  least  squares  regression 

In  fitting  a  function  to  data,  conventional  regression  requires  one  variable  to  be  ‘special’ 
—  this  is  the  dependent  variable.  In  the  sciences  however,  one  often  wishes  to  re-arrange 
the  model  equation  by  changing  the  subject  of  the  formula.  Statisticians  tell  us  that 
in  that  case  we  should  carry  out  a  second  regression.  Yet  scientists  are  uncomfortable 
with  having  separate  models  for  each  variable,  which  are  not  equivalent  to  each  other 
and  yet  are  meant  to  represent  the  same  relationship.  Calibration  is  another  case  where 
one  would  like  mutual  equivalence:  e.g.  in  psychology  one  can  have  two  tests  that  are 
intended  to  measure  the  same  ability:  a  formula  or  conversion  table  is  required  to  relate 
the  score  on  one  test  to  that  on  the  other. 

Another  case  where  regression  is  inappropriate  is  where  one  wants  to  deduce  a  para¬ 
meter  such  as  the  rate  of  change  (slope).  If  both  variables  are  subject  to  error  then  ordin¬ 
ary  least  squares  will  under-estimate  the  slope,  and  regressing  x  on  y  will  over-estimate 
it.  A  simple  example  involves  plotting  galaxy  speed  (or  redshift)  against  distance  from 
the  observer.  The  slope  of  the  fitted  line  gives  what  is  called  the  Hubble  constant,  whose 
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value  crucially  determines  the  future  of  the  universe:  will  it  continue  expanding  or  will 
it  eventually  begin  to  collapse  in  on  itself?  Conventional  regression  gives  different  values 
for  the  Hubble  constant  depending  on  which  variable  is  treated  as  being  dependent,  but 
there  is  no  apparent  reason  for  choosing  one  variable  as  against  the  other. 

An  oft-cited  reason  for  using  least  squares  fitting  is  that  under  certain  assumptions 
on  the  errors,  it  will  provide  the  best  linear  unbiased  estimate  (‘BLUE’)  of  the  slope. 
This  is  the  Gauss-Markov  theorem,  where  ‘best’  is  taken  to  mean  minimum  variance. 
What  is  not  widely  appreciated  is  that  ‘linear’  here  refers  not  to  the  form  of  the  fitted 
model,  but  rather  that  the  expression  for  the  estimated  coefficient  be  linear  in  y.  One 
can  find  estimators  with  even  lower  variance  by  removing  this  non-essential  condition 
e.g.  other  Lp-norm  estimators  are  not  linear  in  y. 

In  multiple  regression  it  is  widely,  and  mistakenly,  believed  that  that  the  fitted  coeffi¬ 
cients  tell  us  the  contribution  that  a  particular  variable  makes  to  the  dependent  variable. 
In  fact,  not  even  the  sign  of  the  coefficient  can  be  relied  upon  to  tell  us  the  direction 
of  the  relationship  i.e.  a  particular  x-variable  may  be  positively  correlated  with  the  y- 
variable,  and  yet  have  a  negative  coefficient  in  the  regression  model.  This  is  the  problem 
of  multicollinearity:  if  there  are  near-linear  relations  among  the  explanatory  variables 
then  the  coefficients  produced  by  regression  will  not  only  be  highly  uncertain  (large 
standard  error)  but  also  not  be  open  to  sensible  interpretation. 

We  shall  present  a  technique  for  model-fitting  which  treats  all  variables  on  the  same 
basis.  The  method  has  the  important  property  of  being  units- invariant;  this  is  an  advant¬ 
age  not  shared  by  the  total  least  squares  approach  (also  known  as  orthogonal  regression), 
and  arises  from  the  fact  that  we  use  the  product  of  the  deviations  in  each  direction  rather 
the  sum  (or  sum  of  squares)  when  calculating  the  fitting  criterion. 

2  The  least  areas  criterion 

Consider  a  set  of  data  points  in  two  dimensions  as  in  Figure  1.  By  drawing  the  vertical 
and  horizontal  deviations  from  the  line  we  create  a  right-angled  triangle  for  each  data 
point.  Our  fitting  criterion  is  simply  to  minimise  the  sum  of  these  areas.  A  key  advantage 
of  this  approach  is  that  changing  the  units  of  measurement  will  not  affect  the  resulting 
line.  In  other  words  it  is  a  scale  invariant  method.  Furthermore  we  can  add  a  constant 
to  either  variable  and  the  geometry  is  such  that  the  line  merely  gets  shifted  vertically  or 
horizontally.  Combining  the  scale  and  translation  invariance  implies  that  the  least  areas 
line  is  invariant  to  linear  transformations  of  the  data.  It  is  also  apparent  that  switching 
the  axes  has  no  effect:  the  variables  are  treated  symmetrically.  (A  textbook  discussion 
of  this  method  appears  in  Draper  and  Smith  [5].) 

We  note  that  it  is  essential  that  there  be  a  non-zero  correlation  in  the  data  otherwise 
the  method  fails.  For  those  seeking  to  quantify  relationships  between  data  variables  in  the 
experimental  sciences  this  would  hardly  seem  to  be  a  restrictive  requirement.  However 
for  those  working  in  the  area  of  design  and  who  are  concerned  with  geometrical  shapes, 
it  does  rule  out  the  fitting  to  data  scattered  around  a  vertical  or  horizontal  line,  or  circle, 
or  rectangle  with  sides  parallel  to  the  co-ordinate  axes  etc..  We  shall  not  discuss  fitting 
curves  in  this  paper  but  we  note  that  this  method  is  not  suitable  for  fitting  a  relationship 
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Fig.  1.  Sum  of  areas  to  be  minimised  in  least  area  calculation. 

that  is  not  monotone  over  the  range  of  the  data  i.e.  there  cannot  be  maxima  or  minima 
over  the  data  range  otherwise  the  area  deviation  associated  with  a  given  point  may  not 
be  uniquely  specified.  Such  problems  may  be  avoided  by  breaking  up  the  data  set  into 
subsets  at  the  optima  and  fitting  a  monotone  function  to  each  subset,  thus  producing  a 
piecewise  monotone  function. 

The  least  areas  method  has  an  interesting  history,  it  has  surfaced  under  different 
guises  in  diverse  research  literatures  throughout  the  twentieth  century.  In  astronomy  it 
is  known  as  Stromberg’s  impartial  line.  In  biology  it  is  the  line  of  organic  correlation. 
In  economics  it  is  the  method  of  minimised  areas  or  diagonal  regression.  In  statistics 
it  is  sometimes  referred  to  as  the  ‘standard  or  reduced  major  axis’.  This  derives  from 
the  fact  that  if  the  data  are  standardised  by  dividing  by  their  standard  deviation,  then 
the  fitted  line  corresponds  to  the  major  (i.e.  principal)  axis  of  the  ellipse  of  constant 
probability  for  the  bivariate  normal  distribution.  Yet  another  name  for  this  technique  is 
the  geometric  mean  functional  relationship.  This  is  because  the  slope  has  a  magnitude 
equal  to  the  geometric  mean  of  the  two  slopes  arising  from  ordinary  least  squares  (OLS) 
(proved  in  Barker,  Soh  and  Evans  [2],  and  Teissier  [20])  i.e.  if  we  regress  y  on  x  and  get 
a  slope  bi  and  then  regress  x  on  y  (so  as  to  minimise  the  sum  of  squared  deviations  in 
the  x-  direction)  and  obtain  a  regression  line  y  =  a-f  then  the  geometric  mean  slope 
is  b  =  (&i&2)1//2«  It  is  interesting  to  note  that  the  two  OLS  slopes  are  connected  via  the 
correlation  between  the  variables 


This  implies  that  as  the  correlation  falls  the  disagreement  between  the  two  OLS  slopes 
increases;  for  example,  even  with  a  correlation  as  high  as  0.71  one  of  these  slopes  will 
be  twice  as  large  as  the  other!  It  also  follows  that  the  magnitude  of  the  slope  of  the 
least  areas  line  lies  between  those  of  the  two  OLS  lines.  This  is  intuitively  satisfying  in  a 
technique  that  aims  to  treat  x  and  y  deviations  symmetrically.  Specifically,  for  the  case 
of  positive  but  imperfect  correlation,  we  have  62  >  b  >  b\  because  b/r  >  b  >  rb. 

From  the  geometric  mean  property  and  the  expressions  for  OLS  slopes  one  can  deduce 
that  the  magnitude  of  the  slope  of  the  least  areas  line  takes  on  a  particularly  simple  closed 
form:  it  is  the  standard  deviation  of  y  divided  by  the  standard  deviation  of  x.  The  sign 
of  the  slope  is  provided  by  the  sign  of  the  correlation  between  y  and  x. 

Numerical  experiments  have  been  carried  out  to  compare  this  fitting  technique  against 
five  others  (Babu  and  Feigelson  [1]).  A  specified  underlying  model  was  used  to  generate 
data  (mostly  bivariate  normal  samples)  and  the  aim  was  to  see  which  method  could 


Model  fitting  using  the  least  volume  criterion 

best  recover  the  slope  of  the  model.  The  simulations  involved  varying  the  sample  size, 
correlation  and  variances.  Orthogonal  regression  gave  the  poorest  accuracy.  There  were 
two  methods  that  came  out  with  highest  accuracy:  the  least  areas  method  and  the  least 
squares  bisector.  The  latter  bisects  the  smaller  angle  formed  between  the  two  OLS  lines. 
Unfortunately  the  OLS  bisector  is  not  units  invariant  and  so  does  not  suit  our  purposes 
(Ricker  [17]). 

Turning  now  to  applications,  the  method  seems  to  have  appeared  most  often  in  the 
field  of  biometrics  (the  application  of  statistics  to  biological  data).  For  example,  in  re¬ 
lating  the  size  of  one  body  part  to  another  (or  to  the  total  weight  or  height)  in  humans 
and  other  animals,  one  may  collect  data  from  an  individual  at  successive  stages  in  their 
growth,  or  from  many  individuals  at  different  points  in  their  development.  It  is  not  gen¬ 
erally  possible  to  distinguish  between  dependent  and  independent  variables  in  such  a 
context.  Isometric  growth  is  the  special  case  where  the  two  body  parts  grow  such  that 
their  size  ratio  remains  constant.  Miller  and  Kahn  [13]  argue  in  favour  of  our  method 
thus:  ‘there  is  usually  no  clear  justification  for  saying,  e.g.  that  increase  in  skull  length  is 
dependent  upon  increase  of  body  length;  it  is  more  realistic  to  consider  changes  in  skull 
length  and  body  length  as  due  to  a  set  of  common  factors’.  Ricker  [16]  discusses  the  value 
of  the  method  in  fishery  research.  Applications  include  modelling  relationships  between 
weight  and  length,  between  weight  and  fecundity  (the  number  of  eggs),  and  estimating 
the  ‘catchability’  of  fish  (the  fraction  of  the  stock  taken  by  one  unit  of  fishing  effort). 
Rayner  [15]  gives  an  application  to  the  flight  speed  of  birds  as  related  to  the  windspeed. 

We  have  already  noted  the  scope  for  application  in  astronomy.  Babu  and  Feigelson 
[1]  point  out  that  ‘differences  in  regression  methods  on  similar  data  may  be  responsible 
for  a  portion  of  the  long-standing  controversy  over  the  value  of  Hubble’s  constant,  which 
quantifies  the  recession  rate  of  the  galaxies’.  The  earliest  appearance  of  our  method  in 
the  astronomical  literature  seems  to  be  that  of  Stromberg  [19]. 

The  method  has  also  been  proposed  in  the  context  of  educational  and  psychological 
testing.  A  very  early  reference  being  that  of  Otis  [14]  who  named  it  the  ‘relation  line’. 
If  two  tests  are  meant  to  measure  the  same  aptitude  or  attainment  one  may  need  to 
match  pairs  of  equivalent  scores  on  the  pair  of  tests  for  creating  a  conversion  table.  The 
direction  of  the  conversion  should  obviously  not  affect  which  values  are  paired  off,  hence 
the  need  for  a  symmetric  approach.  Greenall  [7]  proposes  the  ‘equivalence  line’  for  this 
purpose: 

V  ~  fty  _ ^  ~  ftx 

CFy  dx 

This  turns  out  to  be  yet  another  name  for  our  least  areas  line.  For  standardised  scores 
the  line  equation  reduces  to  y  —  x.  He  also  proves  a  very  interesting  uniqueness  result: 
‘When  we  seek  a  relation  that  will  deem  a  pair  of  scores  mutually  equivalent  if  and  only 
if  the  proportion  of  z-scores  less  than  X  equals  the  proportion  of  y-scores  less  than  Y ,  we 
aim  at  pairing  off  scores  that  give  rise  to  equal  percentile  ranks.  In  the  case  of  continuous 
bivariate  distributions  which  satisfy  a  simple  condition  [F(x,  y)  =  F(y/c,  ex)],  only  the 
equivalence  relation  will  provide  this  relation’.  The  normal  distribution  is  one  case  which 
satisfies  this  condition.  A  relevant  theoretical  result  due  to  Kruskal  [12]  is  that  if  the  two 
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variables  are  normally  distributed  and  a  line  is  needed  to  predict  x  from  y  as  often  as  y 
from  x,  then  the  least  areas  line  maximises  the  probability  of  correct  prediction  (i.e.  the 
probability  of  being  within  2  standard  deviations,  for  any  given  2-value).  This  provides 
another  justification  for  the  use  of  this  line. 

Hirsch  and  Gilroy  [9]  show  how  it  can  be  useful  in  hydrology  and  geomorphology 
where  one  may  be  interested  in  relationships  between  e.g.  stream  slope  versus  elevation, 
or  stream  length  versus  basin  area,  etc..  ‘In  such  cases  there  is  no  clear  direction  of 
causality  but  there  is  clearly  an  inter-relation  of  variables’.  ‘A  major  motivation  for  the 
use  of  the  line  lies  in  the  equivalence  of  the  cumulative  function  of  y  and  yest* 

In  general  terms  when  should  the  least  areas  method  be  used?  Rayner  [15]  cites  the 
result  of  Kendall  and  Stuart  [10]  that  if  no  error  information  is  available  then  this  method 
gives  the  least- bias  or  maximum  likelihood  estimate  of  the  functional  relation.  Rayner 
goes  on  to  demonstrate  that  this  line  also  has  the  property  of  being  independent  of  the 
correlation  between  the  errors  of  the  two  variables. 

Ricker  [17]  deals  with  the  question  of  usage  by  first  distinguishing  between  random 
measurement  error  and  mutual  natural  variability  (as  arises  for  example  in  biology). 
In  the  former  case  for  each  observation  there  is  an  associated  true  point  which  would 
arise  if  the  errors  in  both  variables  were  zero.  If  one  can  estimate  the  variances  of  the 
errors  by  replicating  the  measurements  then  measurement  error  models  can  be  used  to 
estimate  the  line.  One  monograph  on  such  models  is  Cheng  and  Van  Ness  [4].  If  one 
cannot  estimate  the  error  variances  (or  their  ratio,  A)  then  Ricker  recommends  the  use 
of  the  least  areas  line  as  being  the  best  approximation:  it  gives  y  and  x  equal  weight  and 
will  be  exact  if  A  =  var(y)/var(a:),  i.e.  when  the  ratio  of  error  variances  equals  the  ratio  of 
data  variances.  For  the  case  of  mutual  natural  variability  ‘there  is  no  basis  for  assigning 
separate  vertical  and  horizontal  components  to  the  deviation’,  i.e.  ‘it  is  impossible  to 
say  whether  it  is  y  or  x  that  is  responsible  for  the  deviations  from  the  line’.  In  this  case 
Ricker  concludes  that  if  the  data  are  binormally  distributed  then  the  least  areas  line  be 
used  to  describe  the  central  trend,  and  least  squares  to  estimate  one  variable  from  the 
other.  For  the  mixed  case  i.e.  having  both  measurement  error  and  natural  variability, 
‘the  best  that  can  be  done  is  to  treat  them  in  terms  of  whichever  source  of  variation 
makes  the  larger  contribution  to  the  total.  In  biological  work  this  will  usually  be  natural 
variability’.  ’ 

Despite  appearing  in  so  many  other  fields,  it  is  remarkable  that  this  technique  does  not 
seem  to  have  appeared  in  the  numerical  analysis/approximation  literature.  For  example  it 
is  not  listed  in  Grosse’s  Algorithms  for  Approximation  catalogue.  The  present  paper  looks 
at  an  obvious  way  of  extending  the  approach  to  any  number  of  variables  by  minimising 
volumes. 

3  Least  volume  fitting 

We  now  intend  to  fit  a  linear  function  of  the  form  J2j=iajxj  —  c  to  data  {X,-}  in  p 
dimensions,  in  other  words  we  have  data  on  p  variables  and  we  seek  a  linear  relationship 
between  them.  Of  course  this  is  not  uniquely  specified  because  we  can  divide  through  by 
any  non-zero  constant.  Thus  we  are  free  to  impose  a  constraint  on  the  coefficients,  such 
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as  c  =  1.  Note  that  we  shall  not  permit  any  of  the  coefficients  aj  to  be  zero  because  that 
would  imply  the  associated  variable  Xj  is  unrelated  to  the  other  variables 

One  obvious  way  of  generalising  the  least  areas  procedure  to  higher  dimensions  is  to 
minimise  the  volumes  (or  hypervolumes).  Each  data  point  will  have  associated  with  it  a 
‘volume  deviation5  which  is  simply  the  product  of  its  deviations  from  the  fitted  plane  in 
each  dimension.  We  must  take  care  to  make  all  these  non-negative  by  taking  the  absolute 
values.  For  the  zth  data  point  this  volume  deviation  Vi  is  proportional  to 

(Sj=i  ajXij  —  c)p 

' 

We  now  introduce  non-negative  variables  ti*,  Vi  to  deal  with  the  absolute  value  of  the 
numerator.  The  positive  Ui  represent  points  on  one  side  of  the  fitted  plane,  and  positive 
Vi  refer  to  points  on  the  opposite  side.  Setting  c  =  1  allows  us  to  model  the  bracketed 
term  thus: 

Ui  =  Qjj  Xij  1 . 

3 

At  least  one  of  each  of  the  pairs  {ui,Vi}  will  be  forced  to  be  zero  by  their  presence  in 
the  objective  function  which  is  being  minimised.  Consequently  the  numerator  can  be 
represented  as  )•  We  shall  suppose  the  denominator  is  positive;  if  it  is  not  we 

can  always  make  it  so  by  multiplying  one  of  the  x- variables  by  -1  so  that  its  coefficient, 
and  hence  that  of  the  product  of  coefficients,  also  changes  sign.  We  can  now  formulate 
our  problem  as  the  following  fractional  programme: 

Minimise  4-  v\)/  TT  aj 

i 

'  such  that  Ui  —  Vi  =  ^  ajXij  —  1 

3 

and  Ui,Vi>0. 

The  field  of  fractional  programming  is  comprehensively  covered  by  Stancu-Minasian 
[18].  We  note  that  Draper  and  Yang  [6]  used  a  different  route  to  generalising  the  technique 
to  more  than  two  dimensions.  They  minimised  the  pth  root  of  the  squared  volumes  and 
showed  that  the  estimated  coefficients  were  a  convex  combination  of  those  from  the  p 
OLS  estimates. 

4  Numerical  test 

We  shall  now  apply  the  least  volume  criterion  to  try  and  uncover  the  coefficients  from 
data  that  have  been  generated  from  a  known  underlying  model  with  some  randomness 
thrown  in.  In  order  to  make  this  a  difficult  test  we  shall  choose  data,  which  suffers  from 
multicollinearity.  This  means  that  there  is  a  near  linear  dependence  within  the  data,  i.e., 
one  of  the  variables  almost  lies  in  the  space  spanned  by  the  remaining  variables,  and  so 
we  are  close  to  being  rank-deficient.  The  data  is  taken  from  Belsley’s  [3]  comprehensive 
monograph  on  collinearity.  The  generating  model  is 

y  ~  1.2  —  0Ax\  +  0.6^2  +  0.9x3  -f  e 
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with  e  normally  distributed  with  zero  mean  and  variance  0.01.  The  absolute  correlations 
between  the  variables  ranged  from  0.35  to  0.61  and  so  these  in  themselves  would  not 
have  alerted  the  researcher  to  any  difficulty  associated  with  multicollinearity.  Two  very 
similar  data  sets  (A,B)  are  tabulated  in  Belsley  based  on  this  model.  For  set  A  ordinary 
least  squares  (OLS)  gives: 

y  =  1.26  4*  0.97^  +  9.0z2  -  38.4z3. 

The  fit  as  measured  by  R2  is  very  good  at  0.992  but  the  underlying  model  is  far  from 
being  uncovered.  In  particular,  the  coefficient  of  a?2  is  15  times  too  high  and  two  of  the 
coefficients  have  the  wrong  sign!  Getting  the  signs  wrong  is  very  serious  if  one  is  trying  to 
understand  how  variables  are  related  to  each  other.  Turning  to  the  least  volume  approach 
we  find: 

y  =  1.20  -  0.43.tj  -f  0.37^2  +  1.97cc3. 

We  now  have  all  the  correct  signs  and  the  magnitudes  are  much  closer  to  the  true  ones. 
Repeating  this  for  data  set  B: 

OLS  :  y  =  1.275  +  0.25®!  +  4.5x2  ~  17.6X3 

Least  volume  :  y  =  1.20  -  0.43,ti  -f  0.37:r2  +  1.98.T3. 

Once  again  the  least  volume  approach  produces  a  superior  model.  Moreover  it  is  also 
worth  noting  that  the  two  OLS  models  are  very  different,  from  each  other  whereas  the 
least  volume  models  seem  to  be  more  stable  to  small  variations  in  the  data.  This  is 
noteworthy  because  of  how  similar  the  two  data  sets  were:  the  y~ values  were  identical 
for  sets  A  and  B,  and  the  values  never  varied  by  more  than  one  in  the  third  digit. 

Thus  our  method  seems  to  be  much  more  stable  than  OLS.  Of  course  a  comprehensive 

set  of  Monte  Carlo  simulations  is  required  to  fully  explore  this  aspect. 

5  Conclusion 

We  have  presented  a  fitting  method  whose  criterion  combines  the  deviations  in  each 
dimension  by  multiplying  them  together.  This  simple  device  means  that  re-scaling  of 
any  of  the  variables  e.g.  by  changing  the  units  of  measurement,  will  give  rise  to  an 
equivalent  model.  This  property  of  units-invariance  is  not  shared  by  the  total  least 
squares  approach  (or  orthogonal  regression:  where  the  sum  of  the  perpendicular  distances 
to  the  fitted  plane  are  minimised).  By  taking  the  product  of  the  deviations  we  ensure 
that  all  variables  are  treated  on  the  same  basis  and  this  is  useful  if  the  purpose  is  to  find 
an  underlying  relationship  rather  than  to  predict  one  of  the  variables. 

When  we  applied  the  technique  to  data  we  were  able  to  recover  the  underlying  rela¬ 
tionship  much  more  closely  than  when  least  squares  was  used.  Not  only  were  the  signs 
of  the  coefficient  correctly  reproduced  (which  is  crucial  for  understanding  directions  of 
change)  but  also  the  magnitudes  were  much  closer  to  the  true  values  than  least  squares 
estimates.  It  appears  that  the  least  volume  method  may  be  superior  when  there  is  mul¬ 
ticollinearity  in  the  data.  Much  more  simulation  needs  to  be  done  to  investigate  this 
potentially  very  valuable  feature. 
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Abstract 

Of  interest  here  is  the  problem  of  fitting  a  curve  or  surface  to  given  data  by  minimizing 
some  norm  of  the  distances  from  the  points  to  the  surface.  These  distances  may  be 
measured  orthogonally  to  the  surface,  giving  orthogonal  distance  regression,  and  for  this 
problem,  the  least  squares  norm  has  attracted  most  attention.  Here  we  will  look  at  two 
other  important  criteria,  the  h  norm  and  the  Chebyshev  norm.  The  former  is  of  value 
when  the  data  contain  wild  points,  the  latter  in  the  context  of  accept/reject  criteria. 
There  are  however  circumstances  when  it  is  not  appropriate  to  force  the  distances  to 
be  orthogonal,  and  two  possibilities  of  this  are  also  considered.  The  first  arises  when 
the  distances  are  aligned  with  certain  fixed  directions,  and  the  second  when  angular 
information  is  available  about  the  measured  data  points.  For  the  least  squares  norm,  we 
will  consider  some  algorithmic  developments  for  these  problems. 


1  Introduction 

Of  interest  here  is  the  problem  of  fitting  to  given  data  a  curve  or  surface  which  depends  on 
a  vector  a  £  Rn  of  parameters.  The  underlying  approach  is  such  that  (1)  a  point  on  the 
surface  is  associated  with  each  data  point,  (2)  the  fit  of  the  surface  is  measured  by  a  norm 
of  the  vector  whose  components  are  the  distances  between  each  pair  of  corresponding 
points,  (3)  the  (correct)  Gauss-Newton  steps  in  a  are  used  as  a  basis  for  minimizing 
this  norm.  The  distances  may  be  orthogonal  to  the  surface,  giving  orthogonal  distance 
regression  (ODR),  or  may  be  forced  to  satisfy  some  other  criterion  which  makes  them 
non-orthogonal  in  general.  We  consider  both  situations. 

For  the  ODR  problem,  most  attention  has  been  given  to  the  least  squares  norm  (eg 
[5],  [8],  [9],  [16],  [17],  [22]).  Here  we  will  look  at  two  other  important  criteria,  the  l\  norm 
and  the  Chebyshev  norm.  The  former  is  of  value  when  the  data  contain  wild  points,  the 
latter  in  the  context  of  accept/reject  criteria.  For  the  non-orthogonal  distance  problem 
we  will  restrict  attention  to  the  least  squares  case. 

In  terms  of  a  vector  a  £  Rn  of  parameters,  the  curve  or  surface  may  be  defined  in 
two  ways,  (a)  parametrically,  when  a  point  x  on  the  surface  is  given  by 

x  =  x(a,t), 
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with  t  the  parameters  whose  values  define  the  particular  point,  or  (b)  implicitly,  when 
the  surface  is  defined  by  the  set  of  points  x  satisfying  the  scalar  equation 

/( a,x)=0. 

It  is  also  assumed  here  that  the  expressions  required  in  these  representations  are  differ¬ 
entiable  functions  of  their  parameters. 

2  li  and  /<*,  ODR 

Consider  first  the  l\  case.  Then  the  problem  is 

m 

minimize  ||x*  -  z;(a)||, 

i= 1 

where  the  points  z*(a)  are  the  nearest  points  to  x*  on  the  surface  defined  by  a,  and 
where  we  will  assume  throughout  that  unadorned  norms  are  Euclidean  norms.  Let 

8i  =  ||xj  Zj(a) || ,  i  =  l,...,m. 

Then  the  problem  is  effectively  now  defined  in  terms  of  the  vector  a  alone.  It  is  easy  to 
to  calculate  the  correct  Gauss-Newton  step  in  a,  which  minimizes 

P  +  VaMIl! 

with  respect  to  d.  Now 

Va^  =  -^-°i(a))TVa*i(a),  St*  0, 

so  that  there  are  potential  problems  if  any  Si  — >  0.  Given  the  nature  of  the  h  prob¬ 
lem,  we  cannot  exclude  that  possibility.  In  fact  although  S  is  not  a  smooth  function, 
because  derivative  discontinuities  only  occur  at  zero  values  it  is  a  strong  semi-smooth 
function,  as  defined  in  [12].  Ideas  from  smooth  analysis  and  from  strong  semi-smooth 
analysis  as  developed  in  [11]  can  then  be  combined  to  give  a  local  convergence  analysis 
for  the  present  problem.  Fast  local  convergence  for  the  usual  smooth  problem  relies  on 
strong  uniqueness  [4];  for  the  l\  norm,  this  can  be  interpreted  in  terms  of  a  requirement 
that  the  sequence  of  solutions  dfe  is  “well-behaved”  in  a  certain  sense  [1] .  An  analogous 
requirement  can  be  stated  here. 

Let  the  current  approximation  be  afc  and  let  Jk  denote  the  Jacobian  matrix  VaJ( afe), 
assuming  this  exists.  Then  the  Gauss-Newton  step  dfc  minimzes 

||<5(afc)  + Jfcd||!. 

It  is  well  known  (see  for  example  [18])  that  if  Jk  has  full  rank  then  there  always  exists 
a  solution  dk  and  an  index  set  Zk  containing  n  indices  such  that 

fc(a‘)+efJ*d*  =  o;  i€Zk, 

where  e*  is  the  ith  coordinate  vector.  Let  a*  be  a  limit  point  of  the  iteration.  Then  for 
ak  close  enough  to  a* ,  assume  that  Jk  exists  and 

(i)  S(a.k)  +  Jkdk  has  exactly  n  zeros,  corresponding  to  an  index  set  Zk , 
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(ii)  Zk  =  Z*,  independent  of  A;, 

(iii)  the  n  x  n  matrices  whose  rows  are  ef  Jk,i  e  Z*,  are  bounded  away  from  singularity. 

In  practice  these  conditions  ensure  that  dfc  is  unique,  and  there  is  no  redundancy  in  the 
zero  components.  An  analysis  is  given  in  [21]  for  both  parametric  and  implicit  fitting. 
The  main  result  is  the  following. 

Theorem  2.1  [21]  Let  the  Gauss-Newton  method  produce  a  sequence  afe  — ►  a*,  where 
S(sLk)  has  no  zero  components ,  and  let  (i)-(iii)  above  hold.  In  the  parametric  case,  assume 
that  for  all  i  €  Z*  ,  there  exists  a  unique  unit  normal  vector  n?  (up  to  change  of  sign)  at 
the  point  x*  on  the  surface  defined  by  a*.  Then  the  (undamped)  Gauss-Newton  method 
converges  to  a*  at  a  second  order  rate. 

The  significance  of  this  result  is  that,  for  both  parametric  and  implicit  fitting,  any 
tending  to  zero  is  not  by  itself  necessarily  an  obstacle  to  good  performance  of  the 
Gauss-Newton  method  in  the  Zx  case.  What  is  more  significant  is  the  possibility  of  very 
slow  convergence  and  this  has  more  to  do  with  the  number  of  those  zero  components 
of  S  at  a  limit  point,  rather  than  just  their  presence.  A  fundamental  requirement  for 
the  condition  (ii)  is  that  the  number  of  zero  components  of  <5(a*)  is  n.  Of  course,  this 
condition  is  a  rather  special  one,  and  for  many  problems,  will  not  be  satisfied.  There  is 
slow  (possibly  very  slow)  convergence  associated  with  this  case. 

Turning  now  to  the  Z^  problem,  this  can  be  stated 

minimize  max||x*  —  z*(a)||, 

i 

with  Zi(a)  defined  as  before.  Again  Si  =  ||x*  -  z;(a)||  is  not  a  smooth  function,  but  a 
solution  normally  occurs  in  a  region  where  6  is  smooth.  Therefore  the  problem  does  not 
differ  significantly  from  the  usual  nonlinear  minimax  problem:  the  main  requirement  for 
fast  local  convergence  is  that  at  a  limit  point  the  norm  is  attained  at  n  +  1  indices  [4]. 

Two  simple  examples  in  2  dimensions  are  given  by  way  of  illustration.  A  standard 
line  search  is  incorporated  to  force  global  convergence,  although  trust  region  methods 
are  a  popular  alternative.  Indeed,  local  convergence  is  the  main  concern  here,  and  we 
have  not  begun  to  address  important  issues  to  do  with  the  development  of  robust  general 
purpose  algorithms. 

Example  2.2  Consider  the  Spath  data  set  [13]  (m  =  7),  and  consider  fitting  an  ellipse 
defined  implicitly,  using  the  l^  and  Zx  norms.  The  solutions  are  illustrated  in  Figure  1, 
where  the  dashed  ellipse  and  dashed  lines  are  the  Z^  solution  and  corresponding  ortho¬ 
gonal  directions,  and  the  solid  ellipse  and  solid  lines  are  the  Zx  solution  and  corresponding 
directions.  Both  ellipses  were  obtained  using  the  Gauss-Newton  method  starting  from 
the  circle  centre  (5,5),  radius  2,  in  4  and  5  iterations  respectively  for  5  figure  accuracy. 

Example  2.3  Consider  next  the  GGS  data  set  [6],  which  has  m  —  8.  Similar  fits  to 
those  for  Example  1  are  shown  in  Figure  2.  Again  the  Gauss-Newton  method  was  used 
starting  from  the  circle  centre  (5,5),  radius  2,  to  give  convergence  in  6  iterations  (Z^) 
and  7  iterations  (Zx). 

For  both  these  examples  n  —  5,  and  favourable  conditions  hold  so  that  there  is 
quadratic  convergence  both  in  the  Zx  and  Z^  cases.  Otherwise,  the  key  to  recovering  fast 


Some  problems  in  distance  regression 


10 

8 

6 

4 

2 

0 


-2  0  2  4  6  8  10  12 

Fig.  1.  l\  and  loo  fits  to  Spath  data  set. 

local  convergence  in  the  l\  case  is  to  identify  Z*  and  to  reformulate  the  problem  locally 
as 


minimize  ^  ||x*  —  z*(a)||  subject  to  x*  —  z*(a)  =  0,  i  €  Z* .  (2.1) 

i£Z* 

A  similar  remedy  in  the  lQ 0  case  is  as  follows.  For  a  limit  point  a*  of  the  iteration,  let 

r  —  {i  :  <5i(a*J  =  max<5j(a*)}. 

i  - 

Then  if  we  can  identify  /*,  a*  solves,  for  any  j  el*: 

minimize  <5j(a)  subject  to  6i(a)  —  Sj(a)  =  0,  i  e  I*\j. 

Example  2.4  Fitting  an  lQ 0  ODR  line  in  R 3  to  100  random  data  points  (equivalent 
to  finding  the  circumscribing  cylinder  of  smallest  radius)  gives  slow  convergence  of  the 
basic  method,  because  \I*\  —  3  and  n  =  4.  But  once  we  identify  I*  =  {4, 42, 58},  only  5 
iterations  of  the  NAG  Fortran  subroutine  E04UCF  are  required  for  6  figure  accuracy. 

3  Non-orthogonal  l2  distance  regression 

3.1  Using  fixed  directions 

Suppose  that  the  data  come  from  sampling  the  surface  of  a  manufactured  part,  using 
a  coordinate  measuring  machine  with  a  touch  probe.  It  has  been  argued  by  Hulting 
[10]  that  choosing  the  directions  to  be  the  known  probe  directions  v*  (relative  to  a 
fixed  frame  of  reference)  not  only  makes  explicit  use  of  the  measurement  design,  but 
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Fig.  2.  /i  and  fits  to  GGS  data  set. 


also  complies  with  traditional  fixed-regressor  assumptions  (enabling  standard  inference 
theory  to  apply). 


Let  Xj,  t  =  as  usual  be  the  data  points,  and  let  zt  be  the  corresponding 

points  on  the  surface  reached  by  travelling  along  the  lines  from  x*  in  the  direction  v*. 
Then  we  require  to  minimize  ||<5||  where 

Si  =  ||xi  z?;(a)||,  i  = 


with  z*(a)  defined  by 


Zi(a)  —  x?;  =  <5jVi,  i 


where  Vj  satisfying  vfvj  =  1  is  given  for  each  i.  In  case  of  ambiguity,  the  smallest  value 
of  Si  is  chosen.  The  basic  idea  in  efficient  algorithmic  development  is  again  to  treat  the 
problem  as  one  in  a  alone,  which  can  be  solved  as  before  by  the  Gauss-Newton  method 
(or  variants).  Let  a  be  given.  Then  for  each  point  x*,  the  point  where  the  line  through 
X{  in  the  direction  v*  first  cuts  the  surface  can  be  obtained  (this  calculation  replaces  the 
“footpoint  problem”  of  calculating  z*(a)  as  the  point  on  the  surface  in  the  orthogonal 
distance  problem),  giving  Si  as  a  function  of  a.  Methods  based  on  Gauss-Newton  steps 
are  developed  for  the  parametric  case  in  [19],  [20],  and  for  the  implicit  case  in  [7]. 


By  way  of  illustration,  the  2  data  sets  previously  considered  in  Examples  1  and  2 
are  used  to  fit  ellipses  defined  implicitly  with  a  particular  choice  of  directions  v*.  The 
initial  (circles)  and  final  ellipses  (together  with  the  data  points  and  the  directions  v*) 
are  shown  in  Figures  3  and  4.  The  calculations  needed  respectively  19  and  17  iterations, 
reflecting  the  fact  that,  unlike  the  l\  and  cases,  the  convergence  rate  is  linear. 
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Fig.  3.  I2  fit  to  Spath  data  set:  fixed  v*. 


Fig.  4.  Z2  fit  to  GGS  data  set:  fixed  v*. 

3.2  Using  angular  information 

Berman  and  Griffiths  [2,  3]  consider  fitting  a  circle  when  angular  differences  between 
successively  measured  data  points  are  known,  with  applications  in  physics  and  archae¬ 
ology.  This  fitting  problem  has  been  extended  to  the  case  of  ellipses  and  ellipsoids  by 
Spath  in  [14,  15]  and  it  is  this  kind  of  problem  which  is  of  interest  here.  The  methods 
of  [14]  and  [15]  are  based  on  the  alternating  algorithm,  and  while  this  can  be  perhaps 
surprisingly  effective  (particularly  with  a  reparameterization  of  the  problem),  we  con¬ 
sider  here  a  correct  separated  Gauss-Newton  method  similar  to  that  used  before.  In 
addition  to  (usually)  better  local  convergence  properties,  standard  step-length  control 
can  be  incorporated. 
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To  illustrate,  consider  fitting  an  ellipse  in  general  position.  It  is  convenient  to  do  this 
by  allowing  the  data  to  rotate,  and  fitting  to  those  an  ellipse  in  normal  position,  aligned 
with  the  axes.  Let  (x,  y)  denote  the  components  of  x.  Then  we  work  with  the  data 

Xi(<f>)  =  Xi  cos  <p  +  yi  sin  yi(<j))  = -Xi  sin <t>  + yi  cos  <j), 

for  i  =  1, . . .  ,m,  where  </>  is  an  unknown  parameter.  Therefore  we  require  to  minimize, 
with  respect  to  the  6  parameters  a,  <7,  a,  0,  the  function 

m 

^2{(xi((t>)-a-pcos(a  +  ti))2 +  {yi((t>)-b-qsm(a  +  ti))2}, 
i=  1 

where  the  numbers  U  are  given.  Because  (a  +  ti+i)  —  (a  -f  U)  =  U+i  -  U,  for  each  i,  we 
can  interpret  this  as  saying  say  that  the  angular  differences  are  known,  with  a  degree  of 
freedom  given  by  the  parameter  a.  Note  that  at  a  solution  to  this  problem,  the  directions 
between  pairs  of  points  (xi(<j)),yi (</>))  and  the  corresponding  points  on  the  ellipse  will 
not  generally  be  orthogonal  to  the  ellipse. 


Differentiating  the  above  expression  with  respect  to  a,p,  b ,  q  gives 


where 


m  HiLi  cos(a  +  U) 

Y^iLiC0S(aA-U)  cos2 (a +  U)  _  ’ 


E£i  1. 

EHi  ®*(0)cos(ar  +  t«)  J  ’ 


where 

a  _\  m  Etei sin(a  +  ti)  1  r  _  [  Efei vM) 

2  EK=1  sin(Qf  +  ti)  E^I  sin2(a  +  <«)  J  ’  2  [  E£i  Vi{<t>)  sin(a  +  U)  ' 

Then  (3.1)  and  (3.2)  give  (a,b,pyq)  as  functions  of  a  and  0,  provided  that  A\  and  A2 
are  nonsingular:  this  will  be  assumed.  For  given  a  and  0,  we  can  therefore  define  the 
function  to  be  minimized  as 

F(a,<f>)  =  \\S(a,(j))\\, 


where 


|w dl,  i  =  1 


Wi  =  {xi{<j))  -  a  ~pcos(a  •+- 1<),  yi(<t>)  -  b~  qsm(a  +  U))T , 

and  with  a,b,p,q  defined  by  (3.1)  and  (3.2).  Then  we  can  apply  the  Gauss-Newton 
method  to  the  minimization  of  F(a,  <j>).  The  basic  step  d  =  (<5a,  S(j))T  is  given  by  finding 

min  ||<5  +  Jd||,  (3.4) 

dei?2 
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where  J  €  Rmx2  has  ith  row  given  by 

Now 

w T 

v a,t6i(a,  <t>)  =  y-(V^w i  +  (VolPi6,gwi)M),  6i  /  0,  (3.5) 

where 

/  o  \ 

m  =  vq,0  J  eRix2. 

9  J 

It  is  easy  to  compute  M  from  (3.1)  and  (3.2)  which  can  be  interpreted  as  identities  in 
a  and  <f>.  The  details  are  omitted,  but  all  the  linear  systems  use  just  the  matrices  A\ 
and  A2,  and  apart  from  the  solution  of  (3.4)  (a  least  squares  problem  in  two  variables), 
there  remains  only  evaluation  of  expressions. 

Example  3.1  Consider  Example  1  from  [14],  which  has  m  =  11.  Starting  from  a  — 
0,  (j)  =  0,  15  iterations  are  required  to  satisfy  the  stopping  criterion  UdH^  <  0.001. 
The  resulting  value  of  \\S\\2  is  7.7211,  with  a  =  2.1253,  b  —  -0.1700,  p  =  4.1281,  q  = 
3.0931,  a  =  13.2348°,  (j>  =  34.7309°. 

Example  3.2  Next  consider  Example  2  from  [14],  which  has  m  =  8.  Again  starting 
from  a  —  0,  </>  =  0,  9  iterations  are  required  to  satisfy  the  stopping  criterion  ||d||oo  < 
0.001.  The  resulting  value  of  \\5\\ 2  is  4.4946,  with  a  =  4.3608,  b  =  1.9537,  p  =  5.3717,  q  = 
3.3704,  a  =  -0.6215°,  (j)  =  26.3889°. 

4  Conclusions 

We  have  examined  some  aspects  of  fitting  curves  and  surfaces  to  given  data.  The  un¬ 
derlying  criterion  involves  associating  with  each  data  point  a  point  on  the  surface  and 
minimizing  some  norm  of  the  vector  whose  components  are  the  distances  between  pairs 
of  points.  The  distances  can  be  orthogonal  to  the  surface,  or  fixed  in  some  other  way.  But 
the  problems  have  in  common  that  methods  based  on  separated  Gauss-Newton  steps  can 
readily  be  developed. 
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Abstract 

Data-dependent  interpolatory  techniques  can  be  used  in  the  reconstruction  step  of  a 
multiresolution  “a  la  Harten” .  These  interpolatory  techniques  lead  to  nonlinear  mul¬ 
tiresolution  schemes.  When  dealing  with  nonlinear  algorithms,  the  issue  of  the  stability 
needs  to  be  carefully  considered.  In  this  paper  we  analyze  and  compare  several  strategies 
for  image  compression  and  their  ability  to  effectively  control  the  global  error  due  to 
compression. 


1  Introduction 

Multiscale  transformations  are  being  used  in  recent  times  in  the  first  step  of  transform 
coding  algorithms  for  image  compression.  Ideally,  a  multiscale  transformation  allows  for 
an  efficient  representation  of  the  image  data,  which  is  then  processed  using  a  (non- 
reversible)  quantizer  and  passed  on  to  the  encoder  which  produces  the  final  compressed 
set  of  data  which  is  ready  to  be  transmitted  or  stored.  Compression  is  indeed  achieved 
during  the  second  and  third  steps:  the  quantization  and  the  encoding  of  the  transformed 
set  of  discrete  data. 

It  is  quite  clear  that  the  properties  of  the  multiscale  transformation  are  most  im¬ 
portant  in  the  overall  performance  of  the  transform  coding  algorithm.  Until  recently,  the 
multiscale  transformations  used  for  image  compression  were  always  based  on  linear  filter 
banks,  however,  the  nonlinear  alternative  has  been  explored  lately  by  various  authors 
from  different  points  of  view,  and  preliminary  results  show  the  alternative  to  be  very 
promising  [12,  8,  6,  2,  3].  The  key  question  when  using,  or  even  designing,  a  nonlinear 
multiscale  transformation  is  that  of  stability.  In  order  for  such  transformations  to  be 
useful  tools  in  image  coding,  it  is  absolutely  necessary  to  keep  a  tight  control  on  the 
effect  of  quantization  errors  in  the  decoding  process. 

In  this  paper  we  examine  the  question  of  stability  for  nonlinear  multiscale  trans¬ 
formations  within  Harten’s  framework  for  multiresolution  [14,  15].  Harten’s  framework 
is  broad  enough  to  include  all  classical  wavelet  transformations  as  particular  cases  (just 
as  it  happens  in  the  Lifting  framework  of  W.  Sweldens  [17],  developed  slightly  later  in 
time  but  independently),  however  the  design  of  the  multiscale  transformation  is  done 
directly  on  the  spatial  domain. 
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The  building  blocks  of  Harten’s  multiresolution  framework  are  two  operators  that 
connect  adjacent  resolution  levels.  The  Decimation  (or  also,  Restriction)  operator  is  a 
linear  operator  which  acts  as  a  low-pass  filter,  extracting  low-resolution  information  from 
a  discrete  data  set.  The  Prediction  operator  (also  Projection)  uses  low-resolution  data  to 
predict  discrete  data  at  a  higher  resolution  level.  It  is  precisely  the  design  of  this  operator 
what  distinguishes  Harten’s  framework  from  all  other  multiresolution  frameworks.  The 
prediction  operator  is  based  on  a  consistent  Reconstruction  technique,  and  this  opens  up 
a  tremendous  number  of  possibilities  in  the  design  of  multiresolution  schemes.  The  use 
of  the  reconstruction  process  as  a  design  tool  makes  it,  conceptually,  a  simple  matter 
to  introduce  adaptivity  into  the  multiscale  transformation;  we  only  need  to  make  the 
reconstruction  process  data-dependent  [5,  4,  14]. 

This  paper  is  organized  as  follows.  In  Section  2  we  recall  the  so-called  cell- average 
framework,  an  appropriate  setting  for  image  compression,  and  describe  a  class  of  nonlin¬ 
ear  prediction  operators  obtained  by  mean-average  interpolation  [10, 14, 15].  In  Section  3 
we  examine  the  question  of  stability  for  nonlinear  multiscale  transformations  and  relate 
it  to  the  synchronization  of  the  data-dependent  choices  made  in  the  encoder  and  the 
decoder.  We  also  include  a  set  of  numerical  experiments  that  illustrate  he  performance 
of  several  nonlinear  multiscale  transformations. 

2  Multiscale  transformations  in  the  cell-average  setting 

Harten’s  general  framework  for  multiresolution  [15]  relies  on  two  operators,  Decimation 
and  Prediction,  that  define  the  basic  interscale  relations.  These  operators  act  on  finite 
dimensional  Unear  vector  spaces,  V-7,  that  represent  the  different  resolution  levels  (j 
increasing  implies  more  resolution) 

(a)  Dj  :  Vj  Vj~\  (b)  Pj  :  U7-1  ->  V\  (2.1) 

and  must  satisfy  two  requirements  of  algebraic  nature;  D-7  needs  to  be  a  linear  operator 
and  Pj  =  Ivi-1,  i.e.,  the  identity  operator  on  the  lower  resolution  level  represented 
by  FJ_1.  For  all  practical  purposes,  V-7  can  be  considered  as  spaces  of  finite  dimensional 
sequences. 

Using  these  two  operators,  a  vector  (i.e.,  a  discrete  sequence)  v 5  G  can  be  decom¬ 
posed  and  reassembled  as  follows 

yj  — >  =  D-ty-t  •  •  i  • 

(a)  \  ,  ,  p  ,  (b)  v3  =  PjV3~l  +  e3  (2.2) 

N  e3  —  v3  -  p.v3  -i 

where  eJ  represents  the  error  in  trying  to  predict  the  jth  level  data,  u*7,  from  the  low 
resolution  data  v^~l  =  D^V,  using  the  prediction  operator  Pj  . 

In  the  cell-average  setting,  the  discrete  data  are  interpreted  as  the  cell-averages  of 
a  function  on  an  underlying  grid,  which  determines  the  level  of  resolution  of  the  given 
data.  The  one  dimensional  case,  in  which  one  considers  a  set  of  nested  dyadic  grids  on 
the  interval  [0, 1],  {X7  },  j  >  0  of  size  hj  =  2-J7&o, 

Xj  =  {x{  }  x{=i‘hj ,  i  =  0, . . . ,  Nj  Nj  •  hj  =  1 


(2.3) 
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is  the  easiest  one  to  describe,  and  it  is  also  directly  applicable  to  two-dimensional  (2D) 
data  via  tensor  product  [2,  3]  (the  cell-average  framework  in  several  dimensions  and 
non- tensor  product  (unstructured)  grids  is  considered  in  e.g.  [1]). 

In  this  simple  one-dimensional  setting,  the  cell-average  framework  is  characterized  by 
the  following  decimation  operator  D * 

(£>V)»  =  i( 4-i+4)>  1  <  *  <  ATj-i,  (2.4) 

where  Nj  is  the  number  of  equally  spaced  intervals  on  X7,  the  grid  on  [0, 1]  that  repres¬ 
ents  the  jth  resolution  level.  The  consistency  requirement  for  the  prediction  operator, 
i.e.,  DiPj  —  Iyj-i  which  is  the  only  necessary  requirement  for  the  prediction  in  Harten’s 
framework,  becomes  then 

(V-1)«-1  +  (V-1)»=  2«r1.  (2.5) 

Observe  that  (2.4)  and  (2.5)  imply  that 


Hence 


J 

e2i-l 


( V-1)*  +  ( V‘)*-i  =  4  + 4i-v 
=  4-i  -  (py-'hi-i  =  (V~V -  4  = 


Therefore  the  prediction  errors  at  even  and  odd  grid  points  on  the  jth  level  in  (2.2) 
are  not  independent.  By  considering  only  the  prediction  errors  at  (for  example)  the 
odd  points  of  the  grid  X,  one  immediately  gets  a  one-to-one  correspondence  between 
the  sets  {vf}^  <-4  {df with  d?  =  e32i_1  and  vj~l  =  DjvK  The 

one-dimensional  multiscale  transformation  and  its  inverse  can  be  written  as  follows, 


vL  — >Mvl  =  {v°,dx,...,dL) 


'  For  j  =  L, . . . ,  1 

For  i  =  Nj- 1 

4"1  =  (4+4-i)/2 

.  dj  =  4-i  -  (Pj^hi-i 


>.  (2-6) 


Vd  =  (v°,d},...,dL) 


M  xVd 


f 


< 


V 


For  j  = 

For  2  =  1,...,  Nj- 1 

vii-i  =  (Pjvi-'hi-i  +  di 

4  =  2vtl  -  4-i 


>  (2.7) 


Observe  that  since  dj  =  =  —e32i,  the  consistency  relation  (2.5)  implies  that  the 

computation  of  v2i  in  (2.7)  is  equivalent  to 

4  =  2i^_1  - 4-i  =  -  d\  =  (i>i_1)2i  +  e32i.  (2.8) 

Therefore  (2.6)  and  (2.7)  are  just  the  repeated  application  of  the  decomposition  and  reas¬ 
sembling  specified  in  (2.2) (a)  and  (2.2) (b).  Thus  (2.6)  defines  a  multiscale  transformation 
and  (2.7)  is  the  inverse  transformation,  whether  or  not  the  prediction  operator  is  linear. 

Next,  we  follow  [4,  14,  15]  to  describe  a  class  of  linear  prediction  operators  that  leads 
to  the  (1,  M)  branch  of  the  Cohen-Daubechies-Feauveau  family  [7],  which  is  biorthogonal 
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to  the  box  function  [11,  15].  This  class  is  also  considered  in  [6]  within  the  lifting  frame¬ 
work,  where  it  is  described  as  a  particular  case  of  Donoho’s  average  interpolation  [9]. 

Given  an  integer  s  >  1,  for  each  1  <  i  <  Nj- 1  we  construct  a  polynomial,  Pi(x),  of 
degree  2s  such  that 


1  [xl+i 

- -  /  _  Pi{x)d. 

"'j—1  Jx3.  .  }  . 


X  =  «•+/’ 


for  l  =  — s, . . . ,  s. 


There  are  various  ways  to  prove  that  Pi(x)  in  (2.9)  always  exists  and  it  is  uniquely 
defined  by  the  2s  +  1  conditions  in  (2.9)  [1,  9,  14].  Then  we  define 

f  f  Pi(x)dx,  (PjV3"1) 2i-i  =  T-  /  Pi(x)dx.  (2.10) 

n3  Jxt.  .  n3  Jx3.  n 


The  prediction  operator  defined  by  (2.10)  is  data-independent,  hence  linear,  and 
it  clearly  satisfies  the  consistency  relation  (2.5).  It  can  be  shown  that  the  multiscale 
transformations  (2.6)  and  (2.7)  for  this  class  of  prediction  operators  turns  out  to  be  the 
(1,  M  =  2s  +  1)  branch  of  the  Cohen-Daubechies-Feauveau  family. 

A  nonlinear  prediction  operator  is  obtained  if  we  construct  Pi(x)  in  a  data-dependent 
way.  An  example  of  nonlinear  multiresolution  transformation  constructed  in  this  fashion 
is  considered  in  [14,  4,  2],  where  a  nonlinear  ENO-type  technique  (Essentially  Non  Os¬ 
cillatory,  see  [16])  is  used  to  construct  Pi(x).  The  key  idea,  which  is  in  essence  common 
to  the  approach  used  in  designing  nonlinear  filter  banks,  is  to  avoid  using  data  across 
an  edge  for  the  prediction  step. 

The  ENO  nonlinear  technique  is  better  described  if  we  associate  to  each  polynomial 
piece  Pi(x)  a  stencil ,  <%,  which  is  the  set  of  indices  of  the  values  used  to  define  Pi(x).  In 
the  linear  case  Si  =  +  s};  the  stencil  is  independent  of  the  data  set  {V-1} 

and,  as  a  consequence,  Pj  is  a  linear  operator.  In  the  ENO  technique  described  in  [16],  the 
selection  of  stencil  is  made  in  a  data-dependent  way  using  the  divided  differences  of  the 
data  as  a  measure  of  its  smoothness.  Large  divided  differences  occur  when  considering 
data  across  an  edge,  while  divided  (or  undivided)  differences  of  data  on  smoother  regions 
tend  to  be  smaller  in  size. 

The  information  contained  in  the  divided  differences  is  then  used  to  decide  what  is 
Si  for  each  i,  with  the  only  restriction  that  i  e  Si  (to  satisfy  the  consistency  requirement 
(2.5)).  We  follow  [4]  and  consider  all  polynomial  pieces  of  the  same  degree.  In  our  case 
#Si  =  2s,  but  in  principle  one  could  decide  to  lower  the  degree  of  pi(x ),  or  that  of 
some  of  its  neighbours,  whenever  an  edge-detection  mechanism  finds  an  edge  at  the  ith 
interval.  By  lowering  the  degree  of  some  polynomial  pieces  close  to  an  edge,  one  can 
avoid  crossing  the  edge  in  the  prediction  step,  as  much  as  possible.  This  option  is  closely 
related  to  the  nonlinear  multiscale  transformation  considered  in  [6]  (within  the  Lifting 
framework),  where  the  nonlinearity  comes  in  from  adaptively  choosing  from  the  (1  ,M) 
family  of  linear  filters. 

Once  Si  is  determined  ( i  G  Si),  Pi(x)  can  be  uniquely  determined  when  degree  Pi(x)  = 
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#<S*  [1]  so  that 


Pi(x)dx  =  v3m  1 


for  m  e  Si, 


(2.11) 


and  the  prediction  operator  is  then  defined  by  (2.10). 

One  can  be  slightly  more  ‘sophisticated’  in  the  design  of  the  polynomial  pieces.  The 
Subcell  Resolution  technique  [4,  13]  allows  to  account  for  discontinuities  within  a  cell  as 
follows.  If  an  edge  is  detected  in  the  ith  cell,  the  polynomial  piece  Pi(x)  is  discarded  and 
substituted  by  its  left  and  right  neighbours,  pi+\(x)  and  pi^i(x),  assuming  that  their 
respective  stencils  do  not  intersect,  i.e.  *S*_i  p|«St+i  —  0.  At  a  true  one-dimensional  edge 
(a  jump)  on  the  ith  cell,  the  function 

1  fy  1  fxL 

F(y)  =  v-  /  Pi-i(x)dx+  —  /  pi+i{x)dx 
n3  Jx3^  nj  Jy 


will  have  a  zero  on  the  ith  cell  [13],  say  rj,  and  the  location  of  r)  is  used  to  substitute  the 
polynomial  piece  pi(x)  by  the  discontinuous  piecewise  polynomial  function 


Pi-i(s) 
Pi+ 1(«) 


X<T), 
X  >  7]. 


(2.12) 


The  prediction  operator  is  again  defined  by  (2.10)  at  nonsingidar  cells  (cells  in  which  no 
edge  has  been  detected),  while  at  the  singular  cell 


{PjVj  1)2i  = 


1  /■*=< 

—  I  qt{x)dx, 


tX2i~\ 

/  qi(x)dx. 
Mi-2 


In  practice  it  is  unnecessary  to  compute  explicitly  the  value  of  r/;  only  its  location  with 
respect  with  x32i^i  is  needed,  which  can  be  found  by  a  sign  check.  We  refer  the  reader 
to  [4]  (and  references  therein)  for  specific  details  on  this  technique,  in  particular  on  the 
detection  mechanism,  and  on  its  performance. 


3  The  question  of  stability:  Error  control  versus  synchroniza¬ 
tion,  with  numerical  examples 


Lossy  coding  schemes  introduce  errors  into  the  transform  coefficients,  and  it  becomes 
crucial  that  the  nonlinearities  do  not  unduly  amplify  these  errors.  In  lossy  compression 
the  decoder  only  has  the  quantized  detail  coefficients.  If  we  use  a  nonlinear  prediction 
operator  (whether  it  is  constructed  as  described  in  the  previous  section  or  based  on 
locally  adapted  filters,  as  in  [6]  within  the  Lifting  framework),  the  quantization  errors  in 
coarse  scales  could  cascade  across  the  scale  ladder  and  cause  a  series  of  incorrect  choices 
(either  on  the  filters  or  on  the  stencils)  leading  to  serious  reconstruction  errors. 

To  avoid  incorrect  choices  in  the  prediction  step,  whether  within  Harten’s  or  the 
Lifting  framework,  one  would  need  to  send  side  information  on  which  filter  was  used 
(Lifting)  or  what  was  the  interpolatory  stencil  (Harten’s).  This  is  clearly  inappropriate 
when  trying  to  design  a  compression  scheme.  One  way  to  avoid  storing  (and  sending)  side 
information  is  to  somehow  synchronize  the  nonlinear  prediction  operators  in  the  encoder 
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and  the  decoder,  so  as  to  ensure  that  at  a  given  spatial  location  on  a  given  scale,  the 
prediction  operator  will  select  the  same  stencil  (filter  bank),  both  in  the  encoding  and 
the  decoding  steps. 

Within  the  Lifting  framework,  synchronization  is  achieved  in  [6]  by  changing  the 
typical  Split-Predict-Update  steps  to  Split-Update-Predict.  In  doing  so,  it  is  possible  to 
base  the  choice  of  predictor  directly  on  already  ‘quantized  data’,  thus  synchronizing  the 
nonlinear  decisions  made  by  the  encoder  and  the  decoder. 

Within  Harten’s  framework,  synchronization  is  just  a  consequence  of  a  strategy  that  is 
designed  to  fully  control  the  compression  error.  Because  the  main  design  tool  in  Harten’s 
framework  for  multiresolution  is  a  reconstruction  technique,  and  because  A.  Harten  had 
already  worked  with  nonlinear  reconstruction  techniques  in  the  context  of  the  numerical 
simulation  for  hyperbolic  conservation  laws,  so-called  Error-Control  (EC)  strategies  can 
be  found  already  in  the  early  papers  of  Harten  on  multiresolution  [14]. 

Harten’s  mechanism  to  control  the  global  accumulated  error  is  based  on  a  modification 
of  the  direct  multiscale  transformation,  M,  that  ensures  a  prescribed  tolerance  on  the 
global  prediction  errors  (explicit  error  bounds  can  be  found  in  [4,  13]).  The  modified 
transformation  incorporates  the  quantizer  to  the  direct  multiscale  transformation  in 
such  a  way  that  the  prediction  operator  in  the  encoder  also  acts  on  already  ‘quantized’ 
data,  hence  synchronization  is  achieved  because  the  nonlinear  prediction  operators  both 
in  M  and  M-1  work  on  the  same  set  of  discrete  data  at  each  resolution  level. 

To  illustrate  the  effect  of  the  different  techniques,  we  take  a  particular  nonlinear 
prediction  operator,  a  third  order  ENO  reconstruction  technique  with  Subcell  Resol¬ 
ution,  as  described  in  last  section.  We  denote  by  MSr  the  multiscale  transformation 
(2.6),  while  M^R  denotes  the  EC  modified  transform  as  described  in  [2,  4],  and  MgR  a 
multiscale  transformation  in  which  only  synchronization  is  enforced,  as  proposed  in  [6]. 
The  quantization  step  is  carried  out  as  follows: 

qu (cP)  =  2ej round  [d?/(2ej)] 

and  it  is  incorporated  to  the  direct  transformation  in  McfR  and  M§R  (see  [2,  6]  for 
specific  details),  while  in  Msr  it  is  applied  to  the  scale  coefficients  obtained  after  the 
transformation.  In  the  numerical  tests  we  report,  we  take  er  =  8  with  L  =  4  and 
^ j  1/2* 

We  consider  two  different  images:  the  familiar  image  of  Lena  as  an  example  of  a  ‘real’ 
image,  and  a  purely  geometrical  image,  to  which  texture  has  been  added,  as  in  [6]. 

After  the  direct  transformation  (plus  the  quantization  step)  has  taken  place,  a  lossless 
Lempel-Ziv  compression  algorithm  is  applied  to  reduce  the  size  of  the  transformed  image, 
then  a  compression  ratio  is  computed  as  the  number  of  bits  of  the  compressed  repres¬ 
entation  over  the  number  of  bits  of  the  original  image.  To  recover  the  original  image,  we 
undo  the  lossless  compression  and  transform  back  using  (2.7)  in  all  three  cases.  The  full 
compression  algorithm  is  identified  in  each  case  by  an  acronym,  ‘ST’  for  Msr ,  ‘EC’  for 
M^r  and  ‘SYNC’  for  M§R. 

In  Tables  1  and  2  we  compile  a  number  of  quantities  that  measure  the  ‘quality’  of  the 
reconstructed  image,  and  therefore  the  robustness  and  reliability  of  each  multiresolution- 
based  compression  algorithm,  the  magnitude  of  the  global  compression  error,  measured 
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II  ’  1 1 2 

rc 

entropy 

ST 

258 

5.71 

9.08 

11.3:1 

.6449 

SYNC 

195 

6.45 

9.82 

7.9:1 

.8875 

EC 

25.4 

4.47 

5.73 

9.7:1 

.6850 

Tab.  1.  Geometrical  image. 


Fig.  1.  Geometrical  image:  (a)  original,  (b)  ST,  (c)  EC,  (d)  SYNC. 


in  various  norms,  the  compression  rate  rc  and  the  entropy  of  the  transformed  image. 
The  reconstructed  images  in  both  cases  can  be  observed  in  Figures  1  and  2. 

It  can  be  clearly  observed  that  the  absence  of  any  type  of  synchronization  procedure 
can  lead  to  a  very  poor  reconstructed  image.  Synchronization  only  improves  the  quality, 
but  is  not  as  robust  as  the  full  EC  mechanism,  designed  in  this  case  to  enforce  a  certain 
error  bound  in  the  2-norm  (as  observed  in  Tables  1  and  2,  the  2-norm  of  the  global  error 
is  kept  below  —  8).  It  is  worth  mentioning  that  the  compression  rate  and  the  entropy 
of  the  compressed  data  are  all  very  close,  however  the  visual  quality  of  the  reconstructed 
image  is  significantly  better  for  the  EC  compression  algorithm. 

Bibliography 

1.  R.  Abgrall  and  A.  Harten.  Multiresolution  representation  in  unstructured  meshes. 
SIAM  J .  Numer.  Anal  35,  2128-2146  (electronic),  1998. 

2.  S.  Amat,  F.  Arandiga,  A.  Cohen,  and  R.  Donat.  Tensor  product  multiresolution 
analysis  with  error  control  for  compact  image  representation.  Submitted  to  Signal 
Processing,  2000. 

3.  S.  Amat,  F.  Arandiga,  A.  Cohen,  R.  Donat,  G.  Garcia,  and  M.  Von  Oehsen.  Data 
compression  with  ENO  schemes.  Applied  and  Computational  Harmonic  Analysis 
11,  273-288,  2001. 

4.  F.  Arandiga  and  R.  Donat.  Nonlinear  multi-scale  decompositions:  The  approach  of 
A.  Harten.  Numer.  Algorith.  23,  175-216,  2000. 

5.  F.  Arandiga,  R.  Donat,  and  A.  Harten.  Multiresolution  based  on  weighted  averages 
of  the  hat  function  II:  Nonlinear  reconstruction  operators.  SIAM  J.  Sci.  Comput. 
20,  1053-1093,  1999. 

6.  R.  L.  Claypoole,  G.  Davis,  W.  Sweldens,  and  R.  Baraniuk.  Nonlinear  wavelet 
transforms  for  image  coding  via  lifting  scheme,  submitted  to  IEEE  Trans,  on  Image 


Nonlinear  multiscale  transformations 


313 


Method 

nm 

■mil 

iraa 

rc 

entropy 

ST 

318 

5.66 

10.59 

.8261 

SYNC 

277 

■niiTf 

VMM 

.9430 

EC 

26.4 

.8704 

Tab.  2.  Lena. 
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Abstract 

We  present  a  new  approach  to  the  construction  of  biorthogonal  wavelet  transforms  us¬ 
ing  polynomial  splines.  The  construction  is  performed  in  a  “lifting”  manner  and  we  use 
interpolator^  as  well  as  local  quasi-interpolatory  and  smoothing  splines  as  predicting 
aggregates  in  this  scheme.  The  transforms  contain  some  scalar  control  parameters  which 
enable  their  flexible  tuning  in  either  time  or  frequency  domains.  The  transforms  are 
implemented  in  a  fast  way.  They  demonstrated  efficiency  in  application  to  image  com¬ 
pression. 

1  Introduction 

Until  recently,  two  methods  have  been  used  for  the  construction  of  wavelet  schemes 
using  splines.  One  is  to  construct  orthogonal  and  semi-orthogonal  wavelets  in  the  spline 
spaces  (Battle-Lemarie  [2,  7],  Chui-Wang  [6],  Unser-Aldroubi-Eden  [12]).  Another  way 
was  introduced  by  Cohen,  Daubechies  and  Feauveau  [3]  who  constructed  symmetric 
compactly  supported  spline  wavelets  whose  duals,  remaining  compactly  supported  and 
symmetric,  do  not  belong  to  a  spline  space.  However,  since  the  introduction  of  the  lifting 
scheme  for  the  design  of  wavelet  transforms  [11],  a  new  way  was  opened  to  use  splines  as 
a  tool  for  devising  a  full  discrete  scheme  of  wavelet  transforms.  Namely,  various  splines 
can  be  employed  as  predicting  aggregates  in  lifting  constructions. 

2  Lifting  scheme  of  biorthogonal  wavelet  transform 

The  sequences  which  belong  to  the  space  Zj ,  we  call  the  discrete-time  sig¬ 

nals.  The  ^-transform  of  a  signal  {a(A:)}  is  defined  as  follows:  a(z)  =  Sfci-oo  z~k  a(^)* 
Throughout  the  paper  we  assume  that  z  =  etuJ.  We  introduce  a  family  of  biorthogonal 
wavelet-type  transforms  that  operate  on  the  signal  x  =  {x( k )}feL_00,  which  we  construct 
through  lifting  steps. 

The  lifting  scheme  for  the  wavelet  transform  of  a  signal  can  be  implemented  in  primal 
or  dual  modes.  For  brevity  we  consider  only  the  primal  mode. 

Decomposition  Generally,  the  primal  lifting  scheme  for  decomposition  of  signals  con¬ 
sists  of  three  steps:  1.  Split.  2.  Predict.  3.  Update  or  lifting. 

Split  -  We  split  the  array  x  into  even  and  odd  sub-arrays: 

ei  =  {ei(k)  =  x(2 k)},  di  =  {dj(k)  =  x(2k  +  1)},  k  €  Z. 
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Predict  -  We  use  the  even  array  ei  to  predict  the  odd  array  di  and  redefine  the 
array  di  as  the  difference  between  the  existing  array  and  the  predicted  one.  To  be  specific, 
we  apply  some  filter  with  transfer  function  zU {z)  to  the  sequence  ei  and  predict  the 
function  d\(z2)  which  is  the  ^—transform  of  di.  The  z2 -transform  of  the  new  d— array 
is  defined  as  follows: 

(z2)  =  di(z2)  —  zU(z)ei(z2).  (2.1) 

Prom  now  on  the  superscript  u  means  an  update  operation  of  the  array.  Obviously,  the 
prediction  zU(z)ei(z2)  should  approximate  d\{z2)  well. 

Lifting  -  We  update  the  even  array  using  the  new  odd  array: 

ei(z2)  =  ei(z2)  +  /?(;?)2-1di(z2).  (2.2) 

Generally,  the  goal  of  this  step  is  to  eliminate  aliasing  which  appears  while  downsampling 
the  original  signal  x  into  ei-  Further  on  we  will  discuss  how  to  achieve  this  effect  by  a 
proper  choice  of  the  filter  /?. 


Reconstruction  The  reconstruction  of  the  signal  x  from  the  arrays  ej  and  d“  is 
implemented  in  reverse  order:  1.  Undo  Lifting.  2.  Undo  Predict.  3.  Unsplit. 

Undo  Lifting  -  We  restore  the  even  array:  e\ (z2)  =  ei(z2)  —  p(z)z~1  d\{z2). 

Undo  Predict  -  We  restore  the  odd  array:  dx(z2)  =  di(z2)  +  zU(z)ei(z2). 

UNSPLIT  -  The  last  step  represents  the  standard  restoration  of  the  signal  from  its 
even  and  odd  components.  In  the  jz— domain  this  is  x(z)  —  e\ (z2)  +  z~ld\{z2). 

The  lifting  scheme  presented  above,  yields  an  efficient  algorithm  for  the  implementa¬ 
tion  of  the  forward  and  backward  transform  of  x  - »  ej1  U  djf .  These  operations  can  be 
interpreted  as  a  transformation  of  the  signal  by  a  filter  bank  that  possesses  the  perfect 
reconstruction  properties  and  it  is  associated  with  the  biorthogonal  pairs  of  bases  in  the 
space  of  discrete-time  signals.  These  basis  signals  are  synthesis  and  analysis  wavelets. 
Further  steps  of  the  transform  are  implemented  in  an  iterative  way  by  the  same  lifting 
operations. 


3  Polynomial  splines 


We  will  construct  polynomial  splines  of  various  kinds  using  the  even  subarray  of  a  signal, 
calculate  their  values  in  the  midpoints  between  nodes  and  use  these  values  for  prediction 
of  the  odd  array.  In  this  section  we  discuss  some  properties  of  such  splines  and  derive 
the  corresponding  filters  U. 

3.1  B -splines 

The  central  B— spline  of  first  order  on  the  grid  {kh}  is  defined  as  follows: 


if  &  £  [—/i/2, /i/2], 
elsewhere. 


The  central  B— spline  of  order  p  is  the  convolution  ( x )  =  M^”1  (x)  *  M/J  (a?)  p  >  2. 
Note  that  the  B— spline  of  order  p  is  supported  at  the  interval  (—ph/2,ph/2).  It  is  positive 
within  its  support  and  symmetric  around  zero.  The  nodes  of  B— splines  of  even  orders 
are  located  at  points  {kh}  and  of  odd  orders  at  points  {h(k  + 1/2)},  k  e  Z.  It  is  readily 
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verified  that  hMp(hx)  =  Mp(:r),  where  Mp(z)  Mp(x).  Let 

up  :=  {hMp(hk)  =  Mp(k)},  and  wp  :=  {hMp  ( h(k  +  1/2))  =  Mp  (k  +  1/2)}  ,  k  €  Z. 

(3.1) 

Due  to  the  compact  support  of  I? -splines,  these  sequences  are  finite.  We  will  use  for  our 
constructions  only  splines  of  odd  orders  p  =  2r  -  1.  In  Table  1  we  present  the  sequences 
for  initial  values  r  which  are  of  practical  importance. 


k 

-3 

-2 

-1 

0 

1 

2 

3 

u3  x  8 

0 

0 

1 

6 

1 

0 

0 

u5  x  384 

0 

1 

76 

230 

76 

1 

0 

w3  x  2 

0 

0 

1 

1 

0 

0 

0 

w5  x  24 

0 

1 

11 

11 

1 

0 

0 

Tab.  1.  Values  of  the  sequences  up  and  wp. 


We  need  the  z2—  transforms  of  the  sequences  up  and  wp  : 

oo  oc 

up{z2):=  z~2kuP(k),  wp(z2):=  z~2kwp{k). 

k—  —  oo  A;—— oo 


These  functions  are  Laurent  polynomials,  and  are  called  the  Euler-Frobenius  polynomials 

[10]. 

Proposition  3.1.  ( [9] )  On  the  circle  z  —  etu>  the  Laurent  polynomials  up(z2)  are 
strictly  positive .  Their  roots  are  all  simple  and  negative .  Each  root  (  can  be  paired  with 
a  dual  root  6  such  that  (6  =  1.  Thus,  ifp  =  2r  + 1  is  odd,  then  up(z2)  can  be  represented 
as  follows: 


ip(2;2) = n  ~(i + o'"z2)(i + 2)>  o  <  7„  <  i. 

7n 


n= 1 


We  denote 


U?(z)  i=z 


. 1wp(z 2) 


(3.2) 


(3.3) 


up(z2) 

Proposition  3.2  The  rational  functions  Up(z)  are  real-valued  and  Up(-z)  =  ~Up(z). 
If  p  =  2r  +  1  is  odd  then 

'  TrP(,  (a- 2Y+%(a)  „  (-«-2)r+1^(-a)  .. 

1 "  U* {Z)  ~ - 5p) - ’  1  +  ^  (2)  - - ^p) . (3-4) 

where  a  :=  2  +  and  fr(a)  25  a  polynomial  of  degree  r  —  1. 

3.2  Inter polatory  splines 

The  shifts  of  B -splines  form  a  basis  in  the  space  Sp  of  splines  of  order  p  on  the  grid  kh. 
Namely,  any  spline  Spt  E  S£  has  the  following  representation: 

sph(x)  =h'£q(l)K(x-lh).  (3.5) 
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Let  q  :=  {</(/)},  and  q(z 2)  be  the  z2— transform  of  q.  We  introduce  also  the  se¬ 
quences  sp  :=  h{S^(hk)  =  S\ (h)}  and  mp  :=  {S^(h(k  +  1/2))  =  5f(fc  4- 1/2)}  of  values 
of  the  spline  on  the  grid  points  and  on  the  midpoints.  Let  sp(z2)  and  mp(z2)  be  the 
corresponding  z2 -transforms.  We  have 

sm  =  E  q(l)  Mph(k  -  l ),  and  S?  (k  +  1)  =  £  <?(0  K  (*  "  *  +  5)  ■  (3-6) 

Respectively,  sp(z2)  =  q(z2)u(z2),  and  mp(z 2)  =  q(z2)w(z2). 

From  these  formulae  we  can  derive  expression  for  the  coefficients  of  a  spline  which 
interpolates  a  given  sequence  e  :=  {e(k)}  at  grid  points: 

hSl(hk)  =  e(k),  ke  Z,  q(z2)up(z)  =  e(z2)  q(z2)  =  ■  (3.7) 

The  z2— transform  of  the  sequence  mp  is: 

mp(z2)  =  q(z2)wp(z2)  =  zU?(z)e(z2).  (3.8) 

Our  further  construction  exploits  the  super-convergence  property  of  the  interpolatory 
splines  of  odd  orders  (even  degrees). 

Theorem  3.3.  ([13])  Let  a  function  f  €  Ll{— oo,  oo)  havep-\- 1  continuous  derivatives 
and  let  S%  E  interpolate  f  on  the  grid  {kh}.  Denote  fk  =  f((k  +  l/2)h).  Then  in  the 
case  of  odd  p '=  2r  +  1,  the  following  asymptotic  relation  holds. 

sph(h(k+ 1/2))  =  A-<i^V(*+2)(MHV2))(2r+l)^j^~^(^  ±o(h2r+2f(2r+2)), 

(3.9) 

where  bs(x)  is  the  Bernoulli  polynomial  of  degree  s. 

Recall,  that  in  general  the  interpolatory  spline  of  order  2r  +  1  approximates  the 
function  /  with  accuracy  of  h2r+1.  Therefore,  we  may  claim  that  {(k  +  l/2)/i}  are 
points  of  super-convergence  of  the  spline  S%.  Note,  that  the  spline  of  order  2r  +  1,  which 
interpolates  the  values  of  a  polynomial  of  degree  2r,  coincides  with  this  polynomial. 
However,  the  spline  of  order  2r  -f  1  which  interpolates  the  values  of  a  polynomial  of 
degree  2r  +  1  on  the  grid  {kh}  restores  the  values  of  this  polynomial  at  the  mid-points 
{{k-\-\/2)h}.  This  property  will  result  in  the  vanishing  moments  property  of  the  wavelets 
to  be  constructed  later. 

3.3  Quasi-interpolatory  splines 

We  can  see  from  (3.7)  and  (3.8)  that  in  order  to  find  values  at  the  midpoints  of  the  spline 
interpolating  the  signal  e,  the  signal  has  to  be  filtered  with  the  filter  whose  transfer 
function  is  zUf(z).  This  filter  has  infinite  impulse  response  (HR).  However,  the  property 
of  super-convergence  at  the  midpoints  is  not  an  exclusive  attribute  of  the  interpolatory 
splines.  It  is  also  inherent  to  the  so  called  local  quasi-interpolatory  splines  of  odd  orders, 
which  can  be  constructed  using  finite  impulse  response  (FIR)  filtering. 

Definition  3.4  Let  the  function  f  have  p  continuous  derivatives  and  f  :=  {fk  = 
f(hk)},  k  €  Z.  The  spline  S%  €  of  order  p  given  by  (3.5)  is  said  to  be  the  local 
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quasi-interpolatory  spline  if  the  array  q  of  its  coefficients  is  derived  by  FIR  filtering  the 
array  of  samples  f 

q(z2)  =  r(22)f(z2),  (3.10) 

where  T(z2)  is  a  Laurent  polynomial l,  and  the  difference  | f(x)  —  S^(x)|  —  0(ffp^hp).  If 
f  is  a  polynomial  of  degree  p  -  1,  then  the  spline  Sft(x)  =  f(x). 

If  wp  is  the  sequence  defined  in  (3.1)  then  the  midpoint  values  mp  are  produced  by 
the  following  FIR  filtering  of  the  array  of  samples  f:  mp(z2)  =  zUp(z)i(z 2),  Up(z)  := 
z~1Y(z2)wp(z2).  Explicit  formulas  for  the  construction  of  quasi-interpolatory  splines  as 
well  as  the  estimations  of  the  differences  were  established  in  [13].  In  the  present  work 
we  are  interested  in  splines  of  odd  orders  p  =  2r  +  1.  There  are  many  FIR  filters  which 
generate  quasi-interpolatory  splines  but  only  one  filter  of  minimal  length  2r  +  1  for  each 
order  p  ='  2r  +  1.  Let  A(z)  :=  z~2  —  2  +  z2. 


Theorem  3.5  A  quasi-interpolatory  spline  of  order  p  =  2r  +  1  can  be  produced  by 
filtering  (3.10)  with  filters  Y  of  length  no  less  than  2r  +  1.  There  exists  a  unique  filter 
Trm  of  length  2r  +  l  which  produces  the  minimal  quasi-interpolatory  spline  Sr2r"l"1(#)  .  Its 
transfer  function  is: 

r  /  fy  .  j.  /ey  \  2r-f  1  OO 

rm(z2)  =  i  +  Y/prk\k(z),  (  arcsm  V  )  (3.ii) 

fc=l  ^  ^  fc=0 


If  the  function  f  has  2r  +  3  derivatives  then  the  following  asymptotic  relations  hold 
for  the  midpoint  values  of  the  minimal  quasi-interpolatory  spline  of  odd  order: 

Slr+\h(k  + 1/2))  =  f(h{k  + 1/2))  +  h2r+2f(2r+2\h(k  + 1/2))  Ar  +  0{f{2r+3^h2r+% 


Ar  := 


(2  r  +  l)fc2r+2(0) 


& 


r 

r-f-1  ’ 


(2  r  +  2)! 

where  b3(x)  is  the  Bernoulli  polynomial  of  degree  s. 


(3.12) 


This  implies  that  the  super-convergence  property  is  similar  to  that  of  the  interpol¬ 
ator  y  splines.  The  asymptotic  representation  (3.12)  provides  tools  for  custom  design  of 
predicting  splines  retaining  or  even  enhancing  the  approximation  accuracy  of  the  min¬ 
imal  spline  at  the  midpoints. 


Proposition  3.6  If  the  coefficients  of  the  spline  SffJ'1  €  S2r+1  of  order  2r  +  1  are 
derived  as  in  (3.10)  using  the  filter  F£  of  length  2 r  +  3,  with  the  transfer  function 
Yrp(z2)  —  Yrm(z2)  +  pAr+1(2),  then  the  spline  restores  polynomials  of  degree  2r  +  1  at 
the  midpoints  between  nodes,  for  any  real  value  p.  However,  if  p  —  —Ar  then  the  spline 
restores  polynomials  of  degree  2r  +  3. 

If  the  parameter  p  is  chosen  such  that  p  =  (— l)r|p|  then  the  spline  S2fpl  possesses 
the  smoothing  property  [14] . 
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3.4  Examples 

3.4.1  Quadratic  splines 

Inter polatory  spline  Let  a  =  z-1  +  z.  Then 

U}(z)  =  2  a,  and  1  -  Ul(z)  =  (  a  ~  2  j' 

z2  +  6  +  z~2  z~2  +  6  +  z2 

Minimal  spline  The  filters  are 


rl(z2)  =  1-gA  (z),u1m(z)  = 


-z  3  +  9 z  1  4-  9z 


and  l-U^(z)  = 


(a  —  2)2(z~1  +4  +  z) 
16 


Extended  spline 


r^)  =  r^(z2)  +  ^A2w,  u'e(z)  = 


Sz~b  -  25z~3  +  150z_1  +  150 z  -  25^3  +  3r 


(a  -  2)3(3z-2  +  18 z-1  +  38  +  18*  +  3 z2) 


Remark  3.7  In  [5]  Donoho  presented  a  scheme  where  an  odd  sample  is  predicted  by 
the  value  in  the  central  point  of  the  polynomial  of  odd  degree  which  interpolates  adjacent 
even  samples.  One  can  observe  that  our  filter  coincides  with  the  filter  derived  by 
Donoho  }s  scheme  using  the  cubic  interpolatory  polynomial  The  filter  U\  coincides  with 
the  filter  derived  using  the  interpolatory  polynomial  of  fifth  degree.  On  the  other  hand , 
the  filter  U}  is  closely  related  to  the  commonly  used  Butterworth  filter  [8].  Namely,  in 
this  case  the  filter  transfer  functions  ®]'l(z)  :=  (l  +  Ul(z))/2,  §\'h(z)  :=  (l  —  Ul(z))/2 
coincide  with  magnitude  squared  of  the  transfer  functions  of  the  discrete-time  low-pass 
and  high-pass  half-band  Butterworth  filters  of  order  4,  respectively. 

3.4.2  Splines  of  fifth  order  (fourth  degree) 

Interpolatory  spline 

tt2 /  \  =  leiy  +  llz  +  llz^+z-3)  _  2,  ,  _  (q  —  2)3  (a  —  10) 

* 1  ’  z4  +  76z2  +  230  4-  76z-2  +  z~4’  i  V  ;  z4  +  76z2  +  230  +  76z~2  +  z^4 ' 

Minimal  spline  The  filter  is 


Ul(z)  = 


A7(z~r  +  z7)  +  89(^“5  +  z5)  -  2277 (z~3  +  z3)  +  15965a 

27648 


4  Wavelet  transforms  using  spline  filters 

4.1  Choosing  the  filters  for  the  lifting  step 

In  the  previous  section  we  presented  a  family  of  filters  U  for  the  predicting  step  which 
were  originated  from  splines  of  various  types.  But,  as  it  is  seen  from  (2.2),  to  accomplish 
the  transform  we  have  to  define  the  filter  j3.  There  is  a  remarkable  freedom  in  the  choice  of 
these  filters.  The  only  requirement  needed  to  guarantee  a  perfect  reconstruction  property 
of  the  transform  is  that  j3(— z)  =  P(z).  In  order  to  make  synthesis  and  analysis  filters 
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similar  in  their  properties,  we  choose  0(z)  =  U(z)/ 2,  where  17  means  one  of  filters  U 
presented  above.  In  particular,  U  may  coincide  with  the  filter  U  which  was  used  for  the 
prediction. 

We  say  that  a  wavelet  has  m  vanishing  moments  if  the  following  relations  hold: 
Hkez  =  0,  s  =  0, 1, . . .  ,m  —  1. 

Proposition  4.1  Suppose  the  filters  U(z)  and  fi(z)  =  U(z)j 2  are  used  for  the  predicting 
and  lifting  steps ,  respectively.  If  1  —  U(z)  contains  the  factor  (z  —  2  +  l/z)r  then  the 
high-frequency  analysis  wavelets  i /;*  have2r  vanishing  moments.  If,  in  addition  1  —  U(z) 
contains  the  factor  {z-2+l/ z)p  then  the  synthesis  wavelet  rfy  has  2 q  vanishing  moments, 
where  q  =  min{p,r}. 

4.2  Implementation  of  the  transforms 

Suppose,  we  have  chosen  the  filter  /3  =  U/2.  The  functions  zU(z)  and  zU(z)  depend  on 
z2  and  we  write  F(z 2)  :=  zU(z)  and  F{z2)  :=  zU(z).  Then  the  decomposition  procedure 
is  (see  (2.1),  (2.2)): 

dv1(z)  =  di(z)-F{z)ei(z),  e\(z)  =  e^z)  +  ^F(*)  d%{z).  (4.1) 

Equation  (4.1)  means  that  in  order  to  obtain  the  detail  array  dj,  we  must  process  the 
even  array  ei  with  the  filter  F,  with  transfer  function  F(z),  and  extract  the  filtered  array 
from  the  odd  array  di.  In  order  to  obtain  the  smoothed  array  e^,  we  must  process  the 
detail  array  d]f  with  the  filter  4>  that  has  the  transfer  function  $(2)  =  z~1F(z)/2  and 
add  the  filtered  array  to  the  even  array  ei.  But  the  filter  4>  differs  from  Fr/2  only  by 
one-sample  delay  and  it  operates  similarly.  Thus,  both  operations  of  the  decomposition 
are,  in  principle,  identical.  For  the  reconstruction  the  same  operation  is  conducted  in 
reverse  order. 

Therefore,  it  is  sufficient  to  outline  the  implementation  of  the  filtering  with  the  func¬ 
tion  F(z). 

Implementation  of  FIR  filters  originating  from  local  splines  is  straightforward  and, 
therefore  we  only  make  a  few  remarks  on  HR  filters  originating  from  interpolator 
splines.  A  detailed  description  can  be  found  in  [1].  Equations  (3.2)  and  (3.3)  imply 
that,  while  the  interpolatory  spline  of  order  2r  +  1  is  used,  the  transfer  function  F(z)  = 
P(z)/Tln=i  ^-(1  +  7n^)(l  +  7n>z“1)>  where  P(z)  is  the  Laurent  polynomial.  It  means 
that  the  HR  filter  F  can  be  split  into  a  cascade  consisting  of  a  FIR  filter  with  the 
transfer  function  P(z),  r  elementary  causal  recursive  filters  denoted  by  R(n),  and  r  ele¬ 
mentary  anti-causal  recursive  filters,  denoted  by  k(n).  The  causal  and  anti-causal  filters 
operate  as  follows: 

y  =  #(n)x  <==*■  j/(0  =  x(l)  +  7 ny(l  -  1),  y  =  h(n)x  4=^  y(l)  =  x(l)  +  7„j /(l  +  1). 

Example  4.2  (Example  of  recursive  filter)  We  present  HR  filters  derived  from  the 
interpolatory  splines  of  third  order. 
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Let  7}  =  3  —  2\[2 


0.172.  Then 
Ff{z)=  H 


1+2 


(l+7j2)(l  +  7i12-1)' 

The  filter  can  be  implemented  with  the  following  cascade: 

x0{k)  =  47i(a:(A:)+a:(fc  +  l)),  xi{k)  =  x0{k)-^\xi{k- 1),  y(k )  =  Xx{k)-^\y(k  +  l). 
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Abstract 

Given  a  spline  function  as  a  B-spline  expansion  the  object  of  knot  removal  is  to  remove  as 
many  knots  as  possible  without  perturbing  the  spline  by  more  than  a  specified  tolerance. 
In  1987  Lyche  and  Mprken  proposed  an  efficient  knot  removal  algorithm  which  determines 
both  the  number  of  remaining  knots  and  their  position  automatically.  In  this  paper  we 
show  how  their  method  can  be  extended  to  knot  removal  techniques  for  multivariate 
tensor  product  splines.  We  propose  a  number  of  new  strategies  for  removing  as  many 
knots  as  possible,  and  discuss  some  of  the  advantages  and  challenges  posed  by  the  special 
structure  of  tensor  product  splines. 


1  Introduction 

Given  a  spline  function  we  are  often  interested  in  an  approximate  representation  re¬ 
quiring  less  data.  The  object  of  knot  removal  is  to  remove  as  many  knots  as  possible 
from  a  given  spline  without  perturbing  the  spline  by  more  than  a  given  tolerance.  An 
efficient  knot  removal  strategy  presented  in  [6]  determines  both  the  number  of  remaining 
knots  and  their  location  automatically.  This  strategy  was  later  extended  to  parametric 
curves  and  surfaces  in  [5],  and  incorporated  with  various  constraints  such  as  monoton¬ 
icity  and  convexity  in  [1].  An  efficient  implementation  of  knot  removal  for  the  special 
case  of  trilinear  splines  is  given  in  [3].  In  this  paper  we  address  some  of  the  questions 
and  problems  arising  when  extending  the  knot  removal  technique  to  multivariate  tensor 
product  splines. 

The  outline  of  this  paper  is  as  follows.  We  start  by  fixing  notation  and  presenting 
techniques  for  representing  tensor  product  splines.  We  then  proceed  with  generalizations 
of  coefficient  norms,  approximation  methods,  methods  for  ranking  the  knots  etc.,  as  we 
review  the  central  parts  of  the  knot  removal  strategy.  Two  different  ways  of  performing 
knot  removal  are  given  together  with  accompanying  strategies  for  finding  the  desired 
approximations.  We  end  the  paper  with  two  examples  demonstrating  various  aspects  of 
the  knot  removal  techniques  presented. 

2  Notation 

Let  d  =  (<4),m  =  (ra*)  £  Xs  with  0  <  d  <  m  (component- wise)  for  some  positive 
integer  s.  Also  let  tk  —  }^i+d*+1  be  a  knot  vector  with  <4  -I- 1  equal  knots  at  both 
ends  and  with  no  knot  value  occurring  more  than  <4  +  1  times,  for  k  =  1  In  this 

paper  we  will  treat  the  collection  t  =  {tfc}£=1  as  a  “single”  knot  vector  with  “length” 
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m  +  d  + 1  defined  to  be  the  sum  of  the  length  of  the  knot  vectors  tfc,  k  =  1, . . s.  Given 
such  a  knot  vector  we  may  form  products  of  the  basis  functions  associated  with  each 
individual  knot  vector  tfc.  By  letting 


Bi(x)  =  Bi,d,t(x)  =  Bik4k^(xk)  for  1  <  i  <  m, 

k=l 

where  i  =  (i*)  G  Zs  and  x  =  ( xk )  G  Rs  ,we  get  a  total  of  YVk=1  mk  new  basis  functions 

s 

for  the  tensor  product  space  §d,t  =  ®  §dk,tk-  In  this  paper  we  let  Bik  dk^tk  be  the  ik th 

k=l  ’  ’  ’ 

B-spline  of  degree  dk  associated  with  tfc,  for  k  =  1, . . s. 

To  represent  an  element  of  §d,t  we  use  a  variant  of  the  classical  Kronecker  product  of 
matrices.  Recall  that  if  A  =  (aij)“Vj=i  6  Rmi,“S  B  =  (bu)™ijii  €  Mm2’n2  then  this 
product  is  given  by  A  <g>  B  =  (ayB^Yjii-  In  this  paper  we  will  use  the  “equivalent” 
product  defined  by  A  0  B  =  (Aby  )™2^ jf.l5  which  gives  a  more  convenient  ordering  of 
the  matrix  elements  for  our  use.  Also  recall  that  for  real  matrices  A,B,C,D  we  have  the 
following  useful  relations  (assuming  that  the  matrix  products  and  inverses  are  defined) 
(A®B)(C®D)  =  (AC)®  (BD),  (A&B)"1  =  A^^B"1  and  A<g>B  =  PX(B®  A)P2, 
for  some  permutation  matrices  Pi  and  P2.  In  addition  we  have  that  the  product  A(g>B 
will  have  linearly  independent  columns,  provided  the  same  holds  for  A  and  B.  For  further 
properties  of  the  Kronecker  product  we  refer  to  [4]. 

An  element 

7Tt  1  TTt  g  S 

/(*)  =  £-£  n  =  £  /iSi,d,t(x)  e  Sd,t 

ii=l  is=l  fc=l  i<m 

can  now  be  written 

/(x)  =  Bt'f, 


where  Bt  =  ®  Btk  with  Btk  =  (Bl  dk  tk, . .  .,Bm  d  tk)T  for  k  —  1, ..  .,s.  Here  f  is 

k— 1 

a  vector  containing  the  B-spline  coefficients  F  =  (filv.^is)  of  /  given  by  f  =  vec(F)  := 

s 

2J i<m  ^iei>  where  ei  =  ®  eik  with  eik  G  Mmfc.  Finally  we  state  that  for  a  tensor  of  real 

—  k—  1 

coefficients  F  =  (fi)i<i<m  G  Rm  we  let  F^)  denote  the  tensor  F  with  its  elements 
rearranged  according  to  the  cyclic  permutation  of  the  s-tuple  {1, 2, . . s}  given  by  crfc  = 
{fc,fc  +  1,  —  1},  for  k  =  1, 

Finally,  for  a  spline  /  =  Yi<m  /iBi,d,t(x)  we  define  a  class  of  weighted  Zp-norms  of 
its  B-spline  coefficients,  given  by 


(£i<mwil/ilP)1/P> 


max  l/il, 

l<i<m 


for 

for 


1  <  p  <  oo, 
P  =  oo,' 


where  the  weights  are  given  by  =  Ylk=i  tk+dlX i  >  for  1  <  i  <  m.  Using  the 
notation  introduced  above  we  have  that  ||  /  ||fp,t=||  WjVpf  ||iP,  (p  >  1)  where  Wt  is  a 
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diagonal  scaling  matrix  given  by 

W„ - (tS,tT+r-))’ 

These  coefficient  norms  are  easy  to  compute  and  are  known  to  approximate  the  ordinary 
Lp-norms  well  for  splines  of  moderate  degree  [2,6].  In  the  algorithms  we  use  p  —  2  when 
computing  approximations  and  p  =  oo  to  measure  the  error. 

3  The  knot  removal  algorithm 

Given  an  element  /  E  §d,t,  a  tolerance  e  >  0  and  some  norm  ||  •  ||  the  goal  of  the  knot 
removal  algorithm  presented  in  [6]  is  to  find  a  subspace  §d.T  of  §d,t  (t  C  t)  and  an 
element  g  6  Sd,r  with  ||  /  —  g  ||<  e,  and  where  we  want  r  to  be  of  minimal  length.  In 
this  section  we  review  the  basic  parts  of  this  algorithm  as  we  extend  the  theory  to  tensor 
product  splines.  Further  details  of  the  material  in  this  section  can  be  found  in  [2], 

3.1  Finding  approximations 

To  approximate  /  6  §d,t  m  a  subspace  §d,r>  where  r  is  of  “length”  n  +  d  +  1  with 
n  <  m,  we  use  the  spline  g  which  is  the  best  approximation  to  /  in  the  Z2,t-norm. 
In  other  words,  the  spline  we  seek  will  be  the  solution  to  the  minimization  problem 
min  ||  /  —  h  ||22 1.  Solving  this  problem  is  equivalent  to  solving  the  linear  least  squares 

h€§d,-r  ’ 

problem  given  by 

min  l|Wt1/2(Ac-f)||f2,  (3.1) 

where  A  =  C8)  Ak  is  the  knot  insertion  matrix  from  r  to  t  (i.e.  Ak  is  the  knot  insertion 

k=l 

matrix  from  rk  to  tfc,  for  k  —  1,...,$),  f  =  vec(F)  are  the  given  B-spline  coefficients 
of  /  in  Sd,t  and  c  =  vec(C)  are  the  unknown  B-spline  coefficients  of  g  in  §d,r*  Since 

the  knot  insertion  matrix  A  has  full  rank  and  Wt  is  non-singular,  the  normal  equations 
ATWtAc  =  ATWtf  associated  with  the  system  (3.1)  will  have  a  unique  solution  which 
can  be  found  ([2,3])  by  solving  a  series  of  s  tensor  equation  systems  given  by 

(AfWtfcAkjD^  =  (AjWtOD^,  (3.2) 

for  k  =  1, . . 5.  Here  Dk  €  Knk  with  nk  =  (m, . .  ms),  and  we  let  D0  = 

F,  and  set  the  coefficients  of  the  approximation  g  equal  to  the  solution  of  the  last 
tensor  equation  system,  C  =  Ds.  The  tensor  equations  (3.2)  can  be  efficiently  solved 
by  calculating  the  Cholesky  factorization  of  the  banded  coefficient  matrix  (AjWtk  Ak) 

and  solving  for  each  right  hand  side  in  the  tensor  (AjWtk)D^k1\ 

3.2  Ranking  the  knots 

The  final  approximation  to  the  initial  spline  is  found  by  searching  through  a  sequence  of 
approximations,  constructed  by  using  the  approximation  method  of  the  previous  section, 
on  subsets  of  the  knots  of  the  initial  spline.  These  subsets  are  calculated  by  associating  a 
weight  with  each  interior  knot,  representing  a  rough  measure  of  its  importance.  See  [6]  for 
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the  details.  For  higher  dimensional  tensor  product  splines  we  set  the  weight  for  a  given 
knot  to  the  maximum  of  the  weights  corresponding  to  this  knot  when  the  calculation  is 
iterated  over  the  “remaining”  parameter  directions.  We  refer  to  [2]  for  further  details. 

4  Knot  removal  methods 

When  removing  knots  from  a  tensor  product  spline  we  are  faced  with  more  options  than 
in  the  case  of  a  spline  curve.  In  this  section  we  present  two  different  ways  of  performing 
knot  removal.  The  first  one  studied  in  [2]  based  on  a  symmetric  approach,  treats  all  the 
parameter  directions  of  a  tensor  product  spline  simultaneously,  while  the  second  one  will 
treat  one  parameter  direction  at  a  time. 

4.1  Knot  removal  based  on  a  symmetric  approach 

If  we  let  Gf(r)  denote  the  approximation  to  /  E  Sd,t  defined  on  the  knot  vector 
r  we  see  that  the  approximations  in  the  sequence  mentioned  above  can  be  written 
{Gf(rj)}jL0,  where  Tj  is  constructed  from  t  by  removing  j  of  its  interior  knots,  and 
N  =  Yfk- —  (4b  +  1)]  is  the  total  number  of  interior  knots  of  t.  Given  such  a 
sequence  of  approximations  we  can  perform  a  search  on  the  index  j  to  determine  an 
approximation  g *  =  Gf(r*)  to  the  initial  spline  /  with  a  preferably  short  knot  vector 
r*,  and  with  the  property  that  ||  /  —  9 *  <  e,  where  e  is  the  specified  tolerance.  If  the 

knot  vector  r*  is  not  equal  to  any  of  the  two  knot  vectors  tq  or  tn  we  may  repeat  the 
process  to  find  a  new  approximation  based  on  g*  as  proposed  in  [6].  Taking  into  account 
how  the  sequence  {Gf(rj)}jL0  was  constructed  we  expect  the  error  ||  /  —  Gf(rj)  ||joo}t 
to  decrease,  but  not  necessarily  strictly,  for  decreasing  values  of  the  search  parameter 
j .  How  the  search  among  the  possible  approximations  is  done  will  generally  depend  on 
a  number  of  factors,  including  some  which  will  be  discussed  later  through  examples. 
Also  note  that  we  only  have  to  compute  approximations  for  indexes  actually  used  in 
the  search.  By  treating  all  the  directions  simultaneously  we  take  into  consideration  the 
inherent  symmetry  of  the  problem.  As  we  will  see  later  this  will  in  some  cases  enable 
us  to  remove  more  knots  than  by  treating  one  parameter  direction  at  a  time,  but  it  will 
also  lead  to  more  complicated  and  slower  code  in  an  implementation. 

4.2  Knot  removal  for  one  parameter  direction  at  a  time 

In  the  second  knot  removal  method  we  start  by  thinking  of  a  spline  /  E  §, dft  as  a  series 
of  parametric  curves  in  corresponding  high  dimensional  spaces.  We  can  then  perform  a 
parametric  knot  removal  for  each  parameter  direction.  The  advantage  of  this  approach 
is  that  it  is  easy  to  implement  since  we  may  use  existing  knot  removal  routines  for  spline 
curves  with  only  minor  modifications. 

In  the  following  discussion  we  let  e  =  J2i=i£ii  with  e*  >  0  for  all  i,  be  a  given 
tolerance.  Also  let  /(x)  =  ]T\<m  /iBi,d,t(x)  =  B^f  be  a  spline  in  §d,t  =  ®  Sdfc#tk,  with 

—  ’  ’  k=l  ’ 

s 

BJ  =  ®  B£  and  f  =  vec(F).  We  start  by  identifying  a  series  of  parametric  curves 
k— 1 

which  may  be  naturally  associated  with  this  tensor  product  spline.  We  say  that  the 
spline  /  consists  of  the  curves  /fc(^jfc),  for  k  —  1,..  .,s,  where  fk(%k)  is  the  parametric 
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curve  in  1MS  for  Mk  =  (IIp=i  »”fc)(IIp=fc+i  mk),  given  by 

fk{%k)  —  ^ ^mi )  ®  ®tk  ®  ^ 

We  now  return  to  the  problem  of  finding  a  preferably  short  knot  vector  r  C  t  and  a  spline 

g(x)  =  £i<„  CjBj,d:r(x)  6  §d,T  =  ®  Sd|tiTt  with  the  property  that  ||  /  -  g ||i~,t<  e.  To 

fc  =  l 

apply  knot  removal  to  /  E  §d,t  we  can  now  go  through  the  following  steps  for  k  =  1 


1.  Apply  parametric  knot  removal  with  the  tolerance  Ek  to  the  parametric  curve 

fk(%k)  =  ®  ®  ^k-li 

defined  on  tfc,  starting  with  fo  =  f. 

2.  This  will  produce  a  new  parametric  curve  defined  on  the  knot  vector  rk  C  tk 

fk(xk)  —  Im)  <S>  ^kl 

where  f k  -  vec(Fk)  for  Fk  E  Mni5'”’nk’mk+1’-5in8. 

3.  We  also  have  that 


ICfN) 

I  ®  B  Jk  ®  ( 

[Cf>») 

|  ®  Bj,  ®  ( 

&  Imi)]fk> 
U=k+1  /J 

where  Ak  is  the  knot  insertion  matrix  from  rk  to  tk. 

4.  And  consequently 

<  Sk¬ 
it* 

Finally  we  let  the  coefficients  of  the  function  g(x)  =  B^c  E  §d;T  be  c  =  vec(Fs),  and 
we  have  the  following  result. 


II  fk  ~  fk  — 


f/c-i  “  [( j®Ini)  ®  Ak  ®  ^Iin^Jfic 


Theorem  4.1  If  we  let  /(x)  =  Bjff  E  §d,t  am $  9(x)  =  Bjc  ^  §d,r  be  the  tensor 
product  splines  from  the  discussion  above,  then  we  have  ||  /  —  s. 


Proof: 


Let  A  = 


(g>  Ak  be  the  knot  insertion  matrix  from  r  to  t,  and  let  /o(x)  —  B^fo 
k=l 


Knot  removal  for  tensor  product  splines 


327 


be  equal  to  /  and  f8(x)  =  B^fs  be  equal  to  g ,  i.e.  fo  =  f  and  fs  =  c.  Then 
ll/-«IKt=  l|fo-Afs|||00 

=  fo  +  £([(  Va.)  ®  (|kImi)]fk-i  -  [(VAi)  0  (®kImi)]fk-i)  -  J  Akfs 

k—2  jfoo 

-  E  I  [((§iAi)  ®  C§kImi)]  I4-1  “  [(,?iIn0  ® Ak  ®  (ijulmi)]fkl  I.  „ 


-S ffc-1  (,=f+ilm')]fk 


5  Examples 

The  knot  removal  methods  presented  above  have  been  implemented  and  tested  on  a 
computer.  In  this  section  we  present  trivariate  examples  from  this  implementation  and 
propose  different  knot  removal  strategies  depending  on  the  problem  at  hand.  See  [3]  for 
a  detailed  description  of  this  implementation. 

Example  5.1  In  this  first  example  we  will  compare  two  different  strategies  for  searching 
through  a  list  of  approximations  {Gf(rj)}jL 0  introduced  above.  We  will  consider  the 
knot  removal  method  treating  one  parameter  direction  at  a  time,  which  means  that  we 
end  up  solving  a  parametric  knot  removal  problem  with  tolerance  e*  =  e/3,  i  =  1,2,3, 
for  each  of  the  three  parameter  directions. 

To  improve  efficiency  the  parametric  knot  removal  routine  implemented  is  constructed 
in  a  way  that  lets  it  abort  the  computation  if  an  approximation  for  any  component  of  the 
parametric  curve  fails  to  lie  within  the  specified  tolerance.  This  fact  suggests  a  search 
strategy  where  we  compute  successive  approximations  to  the  initial  spline  by  adding  one 
interior  knot  at  a  time,  starting  with  zero  interior  knots,  and  where  each  intermediate 
approximation  is  given  by  the  first  of  these  approximation  processes  to  be  completed. 
Intuitively  we  would  expect  such  a  sequential  search  strategy  to  perform  best  for  “large” 
tolerances  and/or  large  problems,  where  it  is  more  to  gain  by  aborting  an  approximation 
process.  In  this  example  we  have  compared  this  search  strategy  with  a  strategy  proposed 
in  [6]  using  a  binary  search. 

In  all  the  tests  we  have  used  an  initial  trilinear  spline  constructed  by  sampling  the 
function  given  by  f(x,y,z)  —  |[sin(27nr)  +  sin(27ry)  +  sin(27rz)]  in  the  points  specified 
by  a  uniform  3-dimensional  grid  on  the  domain  Q  —  [0,  l]3,  for  four  selected  grid  sizes. 
Each  spline  was  reduced  by  using  both  of  the  search  strategies  mentioned  above,  for 
tolerances  varying  from  e  —  0.001  to  e  =  0.01.  Both  of  the  search  strategies  produced 
approximately  the  same  end  grid  size  in  each  test. 

In  Figure  1  the  CPU-time  of  the  two  search  strategies  is  plotted  against  the  tolerance 
for  the  selected  grid  sizes.  We  observe  that  the  reductions  utilizing  a  binary  search 
perform  best  on  small  problems,  while  the  sequential  search  strategy  turn  out  to  be 
superior  for  large  problems. 
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(a)  Problem  size  253  (b)  Problem  size  1003 


FlG.  1.  A  comparison  of  two  different  search  strategies. 

Example  5.2  In  this  example  we  compare  the  two  different  knot  removal  methods 
presented  in  this  paper.  Here  we  have  used  an  initial  trilinear  spline  constructed  by 
sampling  a  function  given  by  f(x ,  y ,  z)  =  esin(27rx  vz )  in  the  points  specified  by  a  uniform 
3-dimensional  grid  on  the  domain  Q  =  [0,  l]3,  for  varying  grid  sizes.  Each  spline  was 
reduced  by  both  the  method  based  on  the  symmetric  approach  and  the  method  treating 
one  parameter  direction  at  a  time. 


The  results  are  presented  in  Table  1.  We  see  that  in  our  implementation  the  method 
using  the  symmetric  approach  is  by  far  the  slowest  method.  However,  at  least  for  the 
type  of  function  considered  in  this  example  the  method  based  on  the  symmetric  approach 
will  give  a  much  better  reduction  than  the  other. 
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Knot  Removal  for  Trilinear  Splines,  Tolerance  e  =  0.005 

Start 

grid 

Parametric,  binary  search 

Symmetric,  binary  search 

CPU 

End  grid 

Error 

CPU 

End  grid 

Error 

1003 

16.53 

72  x  65  x  65 

4.93800  •  10“3 

63.23 

54  x  53  x  53 

4.92080  •  10“* 

150s 

56.44 

81  x  71  X  71 

4.80243  •  10“3 

122.2 

51  X  49  x  49 

4.77236  •  10“ 3 

2003 

99.48 

68  x  66  x  66 

4.91142 -10-3 

300.9 

54  x  50  x  51 

4.98275  ■  10-3 

2503 

165.3 

74  x  62  x  62 

4.74970  •  10“3 

584.8 

61  X  56  X  56 

4.85916  ■  10“3 

3003 

256.8 

72  x  62  x  62 

4.85316  •  10“3 

1094 

60  x  54  x  53 

4.81551  •  10“3 

350s 

391.4 

75  x  65  x  63 

4.77028  •  10“3 

1312 

54  x  50  x  50 

4.92422  •  10"3 

4003 

494.6 

71  x  59  x  63 

4.79631  •  10"3 

1865 

54  x  50  x  50 

4.81064  •  10“3 

Tab.  1  Knot  removal  for  the  trilinear  splines  of  Example  2. 
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Abstract 

Fixed-  and  free-knot  least-squares  data  approximation  by  polynomial  splines  is  con¬ 
sidered.  Classes  of  knot-placement  algorithms  are  discussed.  A  practical  example  of  knot 
placement  is  presented,  and  future  possibilities  in  free-knot  spline  approximation  are 
addressed. 


1  Introduction 

The  representation  of  univariate  polynomial  splines  in  terms  of  B-splines  is  reviewed 
(Section  2),  leading  to  the  problem  of  obtaining  fixed-  and  free-knot  t2  spline  approxim¬ 
ations  (Section  3).  The  accepted  approach  to  the  fixed-knot  case  is  recalled  (Section  4) 
and  the  manner  in  which  spline  uncertainties  can  be  evaluated  given  (Section  5).  The 
importance  of  families  of  spline  approximants  is  emphasised  (Section  6).  The  free-knot 
problem  is  formulated  (Section  7)  and  several  of  the  established  and  some  lesser-known 
knot-placement  strategies  reviewed  (Section  8).  Conclusions  are  drawn  and  future  pos¬ 
sibilities  indicated  (Section  9). 

2  Univariate  polynomial  splines 

Let  I  :=  [Zmin^max]  be  an  interval  of  the  z-axis,  and  .rmin  =  A0  <  <  A2  <  *  •  •  < 

Aiv-i  <  Atv  <  Ajv+i  =  ffmax  a  partition  of  I.  A  spline  s(x)  of  order  n  (degree  n- 1)  on  I  is 
a  piecewise  polynomial  of  order  n  on  (A j,  AJ+i),  j  =  0 , . . . ,  JV.  The  spline  s  is  Cn~k~l  at 
A j  if  card(A^  =  Xj,£  €  {1, . . .  ,n})  =  k.  The  partition  points  A  =  {Aj}^  are  the  (interior) 
knots  of  s.  To  specify  the  complete  set  of  knots  needed  to  define  s  on  I  in  terms  of  B- 
splines,  the  knots  {Xj}i  are  augmented  by  knots  {A7}^n  and  {Aj}^+2.  q  —  N  +  n, 
satisfying 

Ai_n  <  •  •  •  <  Ao,  Aiv+i  <  •  •  ■  <  A q. 

For  many  purposes,  a  good  choice  [10]  of  additional  knots  is 

Ai— 77,  =  •  ♦  •  =  A0,  A^v-fi  A  q. 


330 


Data  approximation  by  polynomial  splines 


331 


It  readily  permits  derivative  boundary  conditions  to  be  incorporated  in  spline  approxi- 
mants  [7].  On  /,  s(x)  has  the  B-spline  representation  [5] 

Q 

s(x)  :=  s(c,X;x)  =  '^2cjNn>j(X-,x),  (2.1) 

j  =  1 

where  Nnj( A;  x)  is  the  B-spline  [5, 12]  of  order  n  with  knots  {A k}j-n  and  c  —  (ci, . . . ,  cq)T 
are  the  B-spline  coefficients  of  s.  Each  Nnj( A;  a:)  is  a  spline  with  knots  A,  is  non-negative 
and  has  compact  support.  Specifically, 

^  0,  x  £  (Aj_n,  Xj ) ,  supp(ATnj(A;rr))  =  [Aj>_n,  Aj].  (2.2) 

The  B-spline  basis  {JVnj(A;a;)}j=1  for  splines  of  order  n  with  knots  A  is  generally  very 
well-conditioned  [10].  Moreover,  the  basis  functions  for  any  x  £  [^min^inax]  can  be 
formed  in  an  unconditionally  stable  manner  using  a  three-term  recurrence  relation  [5, 12]. 
Specifically,  the  relative  errors  in  the  values  fl(Nnj(  A;  x))  of  the  basis  function  computed 
using  IEEE  floating-point  arithmetic  [18]  satisfy 

\fKNn,j(*;x))  -Nnj(\;x)\  <  CnNnj(\\x)r], 

where  C  is  a  constant  that  is  a  small  multiple  of  unity  and  rj  is  the  unit  roundoff  of  the 
floating  point  processor  [5] .  The  B-spline  basis  for  splines  of  order  3  with  interior  knots 
at  x  =  (1, 2,  5)t  and  coincident  end  knots  at  x  =  0  and  10,  is  shown  in  Figure  1. 


Fig.  1.  The  B-spline  basis  for  splines  of  order  3  for  some  nonuniformly  spaced  knots. 
The  first  three  B-spline  basis  functions  are  shown  as  solid  lines  and  the  remaining  three 
as  dotted  lines. 

Valuable  properties  of  s  can  be  deduced  [12]  from  those  of  the  B-splines.  A  useful 
property  is  that,  for  any  x  €  /,  s(x)  is  a  convex  combination  of  the  coefficients  of  the 
B-splines  whose  support  contains  x .  Thus,  local  bounds  for  s  can  readily  be  found: 

min  ck  <  six)  <  -  max  ck,  x  e  [A*,  A?-+i]. 

j<k<j+n  V  '  j<k<j+n  ’  1  3  J 

These  bounds  imply  a  mimicking  property  for  s,  viz.,  that  the  elements  of  c  tend  to 
vary  in  much  the  same  way  that  s  varies.  Figure  2  depicts  a  spline  curve  s  of  order 
4  with  “non-polynomial”  shape  having  interior  knots  at  x  =  (1,2,5)T,  coincident  end 
knots  at  x  =  0  and  10,  and  B-spline  coefficients  (0.00, 0.20,0.60,0.22, 0.18, 0.14, 0.12)T. 
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To  reproduce  this  shape  to  visual  accuracy  with  a  polynomial  would  require  a  high 
degree  and  hence  many  more  defining  coefficients.  The  mimicking  property  is  evident: 
successive  elements  of  c  rise,  fall  sharply  and  then  gently,  behaving  in  a  similar  way  to  s. 


Fig.  2.  A  spline  curve  with  “non-polynomial”  shape  illustrating  the  mimicking  prop¬ 
erty. 


3  Fixed-  and  free-knot  approximation 

Two  types  of  data  approximation  (or  data  modelling)  in  the  £ 2  norm  by  splines  are 
regularly  considered.  One  is  the  determination  of  the  B-spline  coefficients  c  for  given 
data,  a  prescribed  order  n  and  prescribed  knots  A.  The  other  is  the  determination  of  c 
and  A  for  given  data  and  spline  order  n.  The  former  problem  is  linear  with  respect  to 
the  parameters  of  the  spline,  just  c  being  regarded  as  unknown.  The  latter  is  nonlinear, 
both  c  and  A  being  unknown. 

The  linear  case  is  well  understood,  with  highly  satisfactory  algorithms  [10]  and  soft¬ 
ware  implementations  [1,  16]  available.  The  nonlinear  case  remains  a  research  problem, 
although  useful  algorithms  (Section  8)  have  been  proposed,  implemented  and  used.  Many 
of  these  algorithms  “iterate”  with  respect  to  A,  where  for  each  choice  of  knots  the  res¬ 
ulting  linear  problem  is  solved  for  c.  Thus,  the  linear  problem  (Section  4)  is  important 
in  its  own  right  and  as  part  of  the  solution  strategy  for  knot-placement  algorithms. 

4  Least-squares  data  approximation  by  splines  with  fixed  knots 

The  1 2  data  approximation  problem  for  splines  with  fixed  knots  can  be  posed  as  follows. 

Given  are  data  points  {(a^t/i)}™,  with  Xi  <  •  • «  <  xm,  and  corresponding  weights 
or  standard  uncertainties  {w?}f .  The  Wi  reflect  the  relative  quality  of  the  yi,1  U{ 
is  the  standard  uncertainty  of  yi  and  corresponds  to  the  standard  deviation  of  possible 
“measurements”  at  x  =  Xi  of  the  function  underlying  the  data,  yt  being  one  realisation. 

Given  also  are  the  N  knots  A  =  {Aj}f  and  the  order  n  of  the  spline  s. 

When  weights  are  specified,  the  problem  is  to  determine  the  spline  s(a:)  of  order 
n,  with  knots  A,  such  that  the  two-norm  of  is  minimised  with  respect  to  c. 

’  xThe  Xi  are  taken  as  exact  for  the  treatment  here.  A  generalised  treatment  is  possible,  in  which  the  X{  are  alsc 
regarded  as  inexact.  The  problem  becomes  nonlinear  (in  c). 
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When  standard  uncertainties  are  specified,  the  two-norm  of  is  minimised  with 

respect  to  c.  If  Wi  =  tt”1,  i  =  1, . . . ,  ra,  the  two  formulations  are  identical  in  terms  of  the 
spline  produced.  When  weights  are  specified,  s  is  referred  to  as  a  spline  approximant. 
When  uncertainties  are  prescribed,  s  is  known  as  a  spline  model  There  are  differences 
(Section  5)  in  interpretation  in  terms  of  the  statistical  uncertainties  associated  with  the 
solution  and  in  terms  of  validating  the  spline  model  so  obtained. 

The  use  of  a  formulation  in  terms  of  standard  uncertainties,  together  with  the  B-spline 
representation  (2.1)  of  s,  gives  the  linear  algebraic  formulation2 

mmeTV^T1e,  e  =  y  -  Ac,  (4.1) 

where  y  —  (2/1,  •  •  •  ,2/m)T>  A  is  an  m  x  q  matrix  with  a^j  =  Nnj(xi),  and  Vy  = 
diag (uf,...,w^).  Matrix  computational  methods  can  be  applied  to  this  formulation. 

As  a  consequence  of  property  (2.2)  of  the  B-splines,  A  is  a  rectangular  banded  matrix 
of  bandwidth  n  [8]. 

The  linear  algebraic  solution  can  be  effected  using  Givens  rotations  to  triangularise 
the  system,  back-solution  then  yielding  the  coefficients  c  [6],  The  number  of  floating¬ 
point  operations  (flops)  required  is  to  first  order  0(mn2),  i.e.,  independent  of  the  number 
of  knots.  Hence  computing  a  spline  model  for  many  knots  is  hardly  more  expensive  than 
one  for  a  few  knots.  Moreover,  since  for  many  problems  cubic  splines  ( n  =  4)  yield  a 
good  balance  between  approximation  properties  and  smoothness  (continuity  class  C2), 
regarding  the  order  as  fixed  gives  a  flop  count  O(m). 

The  vector  c  is  unique  [11]  if  there  is  a  strictly  ordered  subset  t  =  {tj} \  of  x  such 
that  the  Schoenberg-Whitney  conditions  [21] 

tj  €  supp(ATn,j(A;  x)),  j  =  l,...,q,  (4.2) 

hold.  In  a  case  where  the  conditions  (4.2)  do  not  hold3,  an  appropriate  member  can  be 
selected  from  the  space  of  possible  solutions.  Such  a  selection  is  also  advisable  if  the 
conditions  are  in  a  practical  sense  “close”  to  being  violated.  A  particular  solution  can 
be  determined  by  augmenting  the  least-squares  formulation  by  a  minimal  number  of 
equality  constraints  for  c  such  that  A  has  full  column  rank  [10]. 

An  instance  of  the  type  of  data  set  to  which  the  algorithms  of  this  paper  are  addressed 
is  shown  in  Figure  3/  Such  a  data  set  (cf.  Section  2)  has  the  variety  of  behaviour  that 
cannot  readily  be  reproduced  by  some  other  classes  of  approximating  functions. 

5  Spline  uncertainties 

Once  a  valid  spline  model  has  been  obtained,  the  uncertainties  associated  with  the 
spline  can  be  evaluated  [9].  Uncertainty  evaluations  are  essential  in  metrology,  where  all 
measurement  results  are  to  be  accompanied  by  a  quantification  of  their  reliability  [2], 
and  important  in  other  fields.  The  key  entity  is  the  covariance  matrix  Vc  of  the  spline 


2  A  further  generalisation  is  possible  in  which  mutual  dependencies  are  permitted  among  the  measurement  errors. 
In  this  case,  Vy  is  non-diagonal. 

3A  set  of  knots  giving  rise  to  this  circumstance  may  be  a  consequence  of  an  automatic  knot- placement  procedure. 
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Raw  data 
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Fig.  3.  A  data  set  representing  heat  flow  as  a  function  of  temperature.  Such  data  forms 
the  basis  of  the  determination  of  thermophysical  properties  of  materials  under  test.  For 
clarity  only  every  fifth  data  point  is  shown. 

coefficients  c.  Using  recognised  procedures  of  linear  algebra, 

Vc  =  {AtV^A)-\  (5.1) 

From  this  result,  the  standard  uncertainty  of  any  quantity  that  depends  on  c  can  be 
evaluated.  Specifically,  for  a  given  constant  vector  p,  the  standard  uncertainty  u(pTc) 
of  pTc  is  given  by 

u2(pTc)  =  pTVcp. 

By  setting  p  to  contain  the  values  of  the  B-spline  basis  at  a  point  x  €  /,  the  standard 
uncertainty  of  s(x)  can  be  formed.  The  standard  uncertainty  of  a  nonlinear  function  of 
c  can  be  estimated  by  first  linearising  the  expression  about  the  solution  value  of  c. 

If  weights  rather  than  uncertainties  are  specified  for  the  data,  (5.1)  takes  the  form 

Vc  =  a2(ATW2A)-\ 

where  &  estimates  the  standard  deviation  of  the  weighted  residuals  W  = 

diag(t/q,...,wm),  and 

a2  =  eTW2e  /{m  -  q) 

evaluated  at  the  solution. 

6  Families  of  approximants 

When  dealing  with  certain  classes  of  approximating  function  it  is  natural  and  useful  to 
consider  families  of  approximants.  A  simple  example  is  polynomial  approximation,  for 
polynomials  pj (x)  of  order  j  =  1, 2, . . . ,  N,  for  some  maximum  order  JV.  Each  member 
of  the  family  “contains”  the  previous  member.  It  is  then  meaningful  to  consider  the 
approximation  measure,  e.g.,  the  ^-norm  here,  with  respect  to  indices  denoting  mem¬ 
bers.  Thus,  the  value  of  the  f2-norm  for  the  polynomial  approximant  of  order  j  can  be 
inspected  with  respect  to  index  j  for  j  =  1. 2, . . . ,  N.  For  data  approximation,  it  is  more 
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meaningful  to  use  as  the  measure  the  root-mean- square  residual  given  by  dividing  the 
^2-norm  by  (m  —  j)1/2.  For  representative  data,  the  expectation  is  that  as  j  increases 
this  quantity  should  stabilise  to  an  essentially  constant  value.  This  property  provides  a 
useful  validation  procedure.  If  weights  u~l  are  used  as  in  Section  4  this  measure  should 
settle  to  the  value  unity.  Thus  the  approximant  with  index  j  (normally  the  smallest 
such)  that  achieves  the  value  one  is  sought. 

Within  most  of  the  strategies  outlined  in  Section  8  it  is  possible  to  produce  results 
for  N  =  1,2,...  knots,  and  thus  to  study  the  effect  of  the  number  of  knots  on  the  quality 
of  the  approximant.  From  such  information  it  may  be  possible  to  select  an  acceptable 
solution.  If  for  each  number  of  knots,  the  knots  contain  those  for  the  previous  number, 
and  an  i2  approximant  is  determined,  the  sequence  of  approximants  for  N  =  1,2,... 
knots  forms  a  family.  A  family  has  the  property  that  the  sequence  of  values  of  the 
^2-norm  is  monotonically  decreasing. 

7  Least-squares  data  approximation  by  splines  with  free  knots 

The  problem  of  least-squares  data  approximation  by  splines  with  free  knots  can  be 
formulated  in  the  same  way  as  that  for  fixed  knots  (Section  4),  except  that  the  knots 
are  not  specified  a  priori ,  either  in  location  or  number.  The  formulation  (4.1)  no  longer 
yields  a  linear  problem,  since  the  matrix  A  of  B-spline  values  is  now  a  function  of  A. 
Instead,  e(A)  =  y  —  A(A)c,  and  it  is  required  to  solve 

mineT(A)V-1e(A).  (7.1) 

A;c 

In  order  to  reflect  the  fact  that  for  any  given  knot  set  the  B-spline  coefficients  are  given 
by  solving  a  relatively  simple,  linear  problem,  formulation  (7.1)  can  be  expressed  as 

nun  ^mmeT(A)Vry“1e(A)j  .  (7.2) 

Extensive  use  is  made  of  this  elementary  result. 

8  Knot-placement  strategies 

Many  knot-placement  strategies  have  been  proposed  and  used.  Some  of  these  strategies 
are  outlined  and  their  properties  indicated.  Several  of  the  strategies  generate  a  family  of 
candidate  spline  approximants,  with  advantages  for  model  validation. 

8.1  Manual  methods 

Manual  methods  can  be  classed  as  those  methods  for  which  the  user  examines  the  general 
“shape”  of  the  function  underpinning  the  data,  selecting  the  number  and  location  of  the 
knots  on  this  basis.  With  practice  and  visual  aids,  acceptable  solutions  can  often  be 
obtained  [6].  Naturally,  knots  are  chosen  to  be  more  concentrated  where  “things  are 
happening”  in  contrast  to  regions  where  the  underpinning  behaviour  is  innocuous. 

8.2  Strategies  that  depend  only  on  abscissa  values 

Strategies  based  on  the  manner  in  which  the  values  of  the  independent  variable  are 
distributed  may  be  used  to  place  the  knots  (at  points  that  are  not  necessarily  the  data 
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abscissae  themselves).  A  facility  in  DASL  (the  NPL  Data  Approximation  Subroutine 
Library)  [1]  provides  one  such  strategy,  based  on  the  Schoenberg- Whitney  conditions 
(4.2)  in  the  following  way.  Intuitively,  these  conditions  imply  that  there  is  no  region  where 
there  are  “too  many”  knots  compared  with  the  number  of  data  points.  Mathematically , 
these  conditions  guarantee  uniqueness.  Numerically ,  their  satisfaction  does  not  ensure 
that  the  solution  is  well-defined.  If  the  conditions  are  “close”  to  being  violated,  c  will  be 
sensitive  to  perturbations  in  the  data.  In  particular,  since  the  behaviour  of  c  “controls” 
that  of  $  (Section  2),  the  spline  is  likely  to  exhibit  spurious  behaviour  such  as  large 
undesirable  oscillations  if  ||c||2  >  ||y||2- 

It  follows  that  a  sensible  choice  of  knots  would  be  such  that  the  Schoenberg- Whitney 
conditions  are  satisfied  “as  well  as  possible”  for  a  data  subset.  Such  a  choice  is  made  in 
DASL  [1]  for  spline  approximation  of  arbitrary  order.  It  is  also  made  in  a  cubic  spline 
interpolation  routine  in  the  NAG  Library  [16],  regarding  spline  interpolation  as  a  special 
case  of  spline  approximation  in  which  q  —  m  and  N  =  m  —  n.  The  choice  made  is  seen 
most  simply  by  first  applying  it  to  spline  interpolation.  Consider  the  choice 

A?  =  2  (‘Cj+Ln/2J  "I"  ^j+l(n4-i)/2j  )i  j  —  1, . . . ,  ra  —  n, 

where  |_^J  is  the  largest  integer  no  larger  than  u.  For  n  even,  A j  =  Xj+n/ 2.  Thus,  the  choice 
tj  =  \j-n/2  would  be  made.  However  (Section  2),  supp (Nnj)  =  [Aj_n,Aj].  Thus,  index- 
wise,  the  Schoenberg-Whitney  conditions  are  satisfied  as  well  as  possible  in  the  sense 
that  the  index  of  \j-n/2  falls  halfway  between  the  indices  of  the  support  endpoints  A 
and  Xj .  Comparable  considerations  apply  for  n  odd.  Precisely  this  choice  is  recommended 
[14,  16]  in  the  context  of  cubic  spline  interpolation.  It  is  the  “not  a  knot”  criterion,  as 
a  practical  alternative  to  the  classical  use  of  boundary  derivatives.  A  knot  is  placed  at 
each  “interior”  data  value  Xi  apart  from  X2  and  xm~i- 

The  above  choice  can  be  interpreted  as  follows.  Consider  the  graph  x  =  F(£)  given 
by  the  join  of  the  points  {(i,^)}^.  The  jth  interior  knot,  Xj,  for  j  =  1,.  ..,m  -  n, 
is  given  by  F(j  ~b  n/2).  The  successive  spacings  between  the  index  arguments  of  F  for 
j  =  0, . . . ,  N  +  1,  using  F( 0)  =  xmin  and  F(N  +  1)  =  xm^x,  are  therefore 

1  +  n/2, 1,...,  1,1  +  n/2. 

JV  — 1 

For  approximation ,  these  successive  spacings  are  proportionally  increased  to  account  for 
the  fact  that  there  are  fewer  knots.  The  resulting  expression  for  the  7th  interior  knot  is 

Xj  =  F(  1  +  (m  -  1)0  +  n/2  -  1  )/(q  -  1)),  j  =  1, . . . ,  N. 

The  choice  can  be  interpreted  as  placing  the  interior  knots  such  that  there  is  an  approx¬ 
imately  equal  number  of  data  points  in  each  knot  interval  (interval  between  adjacent 
knots),  except  that  in  the  first  and  the  last  interval  there  are  approximately  n/2  times 
as  many  points.  The  strategy  [1]  has  the  property  that  when  N  is  such  that  the  data  is 
interpolated,  the  choice  of  knots  agrees  with  one  of  the  recommended  choices  for  spline 
interpolation.4 

4 The  approach  tends  to  give  better  knot  locations  if  the  data  is  gathered  in  a  manner  which  ensures  that  the  local 
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Figure  4  illustrates  the  above  strategy  for  a  spline  interpolant  and  approximant  of 
order  4  to  data  with  abscissae  x  =  (0,0.25,0.5,0.75, 1, 1.25, 1.5,  L75,  2,3,4, 5,7.5, 10)T. 
Each  figure  shows  the  graph  x  =  F(£).  For  the  interpolant  (left-hand  graph),  ten  knots 
are  chosen  to  coincide  with  the  abscissa  values  £3, . . . ,  ®i2*  For  the  approximant  (right- 
hand  graph),  four  knots  are  chosen  such  that  there  are  two  points  in  each  interval, 
excepting  the  first  and  last  interval  where  there  are  four  points,  i.e.,  nj 2  =  2  times  as 
many.  The  distribution  of  the  knots  reflects  that  of  the  abscissa  values. 


0  5  10  15  0  5  10  15 


Fig.  4.  A  knot  placement  strategy  depending  only  on  the  abscissa  values. 

A  simpler  strategy  is  to  select  uniformly  spaced  knots.  The  Schoenberg-Whitney 
conditions  will  not  necessarily  automatically  be  satisfied  by  such  a  choice,  and  the  spline 
approximant  would  therefore  not  be  unique,  although  the  approach  indicated  at  the  end 
of  Section  4  could  be  applied. 

8.3  Sequential  knot-insertion  strategies 

In  a  sequential  knot-insertion  strategy,  a  succession  of  approximants  is  obtained,  in  which 
for  each  approximant  a  knot  is  inserted  in  the  knot  interval  that  gives  rise  to  the  greatest 
contribution  to  the  I2  error.  A  knot  interval  is  an  interval  between  adjacent  knots,  where 
the  endpoints  of  I  count  as  knots  for  this  purpose.  Previously  inserted  knots  are  retained 
undisturbed.  Several  variants  are  possible  (also  see  Section  8.10),  e.g.: 

•  Start  the  process  with  a  number  of  knots  already  in  place,  perhaps  obtained  from 
information  specific  to  the  application. 

•  Candidate  positions  for  a  new  knot  are 

*  The  continuum  of  points  within  the  interval.  The  approach  gives  rise  to  the 
minimisation  of  a  univariate  function  that  may  possess  local  minima. 

*  The  subset  within  the  interval  of  a  discrete  set  of  points  chosen  a  priori ,  e.g.,  the 
data  abscissa  themselves  or  a  uniformly  spaced  set  of  a>values.  The  approach 

density  of  the  data  is  greater  in  regions  where  the  behaviour  of  y  is  more  marked. 
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gives  rise  to  a  finite  computation  for  the  globally-best  choice  of  knot,  relative 
to  the  discretisation,  with  respect  to  previous  knots. 

•  More  than  one  knot  can  be  inserted  at  a  time.  Doing  so  gives  an  approach  that 
is  intermediate  between  full  optimisation  (Section  8.6)  and  sequential  (single)  knot 
insertion.  Computation  times  rise  rapidly  with  the  number  of  “simultaneous”  knots 
so  inserted,  so  in  practice  only  a  small  number,  say  two  or  three,  might  be  feasible. 

The  “upper  set”  of  crosses  in  Figure  5  shows  the  root-mean-square  residual  as  a  function 
of  the  number  of  knots  for  the  application  of  this  strategy  to  the  thermophysical  data 
of  Figure  3. 


Fig.  5.  The  root-mean-square  residual  as  a  function  of  the  number  of  knots  for  the 
application  of  knot- insertion  and  knot-removal  strategies  to  the  thermophysical  data  of 
Figure  3.  The  “upper  set”  of  crosses  indicate  the  values  obtained  for  knot  insertion  and 
the  lower  for  knot  removal.  The  knot-removal  strategy  starts  with  the  knot  set  provided 
by  the  knot-insertion  strategy,  which  was  terminated  after  81  knots  had  been  placed. 
The  figure  depicts  the  root-mean-square  residual  on  a  logarithmic  scale,  so  its  value 
varies  by  a  factor  of  1000  from  1  to  81  knots. 


8.4  Sequential  knot-removal  strategies 

In  a  sequential  knot-removal  strategy,  the  starting  point  is  an  initial  spline  approximant 
having  a  “large”  number  of  knots  that  typically  would  be  regarded  as  an  acceptable 
approximant  to  the  data  and  that  contains  (perhaps  many)  more  knots  than  desired.  Also 
see  Section  8.10.  Each  successive  approximant  is  obtained  from  the  previous  approximant 
by  deleting  one  (or  more)  knots.  The  knot  selected  for  removal  is  chosen  as  that  having 
least  effect  in  terms  of  the  change  in  the  £2  error.  The  process  is  continued  until  an 
acceptable  approximant  is  no  longer  obtained. 

The  initially  large  number  of  knots  (Section  8.10)  provides  an  appreciable  number 
of  candidate  knots  for  removal  and  thus  greater  flexibility.  The  rationale  is  that  in 


Data  approximation  by  polynomial  splines 

contrast  to  successive  knot  insertion  a  succession  of  acceptable  approximants  is  obtained 
as  opposed  to  a  succession  of  unacceptable  approximants,  until  the  final  “solution”  is 
provided.  There  are  variants,  as  with  sequential  knot  insertion.  For  example,  several 
knots  can  be  removed  at  each  stage. 

A  different  class  of  knot  removal  algorithms  [20]  is  based  on  a  general  class  of  lv  norms. 
It  is  not  concerned  specifically  with  data  approximation,  but  with  replacing  an  initial 
spline  approximant  (that  may  correspond  to  an  approximant)  by  one  that  is  acceptably 
close  according  to  the  measure. 

The  “lower  set”  of  crosses  in  Figure  5  shows  the  root-mean-square  residual  as  a 
function  of  the  number  of  knots  for  the  application  of  this  strategy  to  the  thermophysical 
data  part-depicted  in  Figure  3. 

8.5  Theory- based  approaches 

The  distance  of  a  spline  $(x)  with  knots  A  from  a  sufficiently  differentiable  function 
f(x)  is  proportional  to  /in|/^(OI>  where  h  is  the  local  knot  spacing  and  £  is  a  value  of 

x  [14].  Consider  inverting  this  expression  in  order  approximately  to  equalise  the  error 

with  respect  to  x.  The  lengths  of  the  knot  intervals  should  consequently  be  chosen  to  be 
proportional  to  |/^(^)|_1/n,  where  f  is  a  value  in  the  neighbourhood  of  the  respective 
knot  interval.  Consider  the  function 

PX  J  PX  max 

F(x)=  I  \f{n)(t)\1/ndt/  \rn\t)\^ndt .  (8.1) 

J  Xjnin  /  *^Xjnin 

Take  knots  given  by 

F{x^^NT~r  ^  =  4  (8-2) 

This  result  corresponds  to  dividing  the  range  of  the  monotonically  increasing  function 
F(x ),  for  x  e  I,  into  N  +  1  contiguous  subranges  of  equal  length,  taking  the  values  of  x 
corresponding  to  the  subrange  endpoints  as  the  knots. 

In  practice  /,  let  alone  F,  is  unknown.  Various  efforts  have  been  made  to  estimate  / 
and  hence  F  from  the  data  points.  For  instance,  if  the  data  is  approximated  by  a  spline  of 
order  n+  1,  its  nth  derivative,  a  piecewise-constant  function,  can  be  used  to  estimate  F 
[3].  It  is  then  straightforward  to  form  the  required  knots.  The  approach  begs  the  question 
in  the  case  of  data.  In  order  to  estimate  knots  for  a  spline  of  order  n,  it  is  first  necessary 
to  construct  a  spline  approximant  of  order  n- hi  for  the  data,  the  construction  of  which 
itself  requires  a  choice  of  knots. 

Alternatively  [13],  a  spline  approximant  of  order  n  for  the  data  can  be  constructed 
for  some  convenient  choice  of  knots.  Its  nth  derivative  is  of  course  zero  (except  at  the 
knots).  However,  its  (n  —  l)th  derivative  is  piecewise  constant,  a  function  that  can  be 
approximated  by  the  join  of  the  mean  values  at  the  knots  of  the  constant  pieces  to  the 
immediate  right  and  left,  with  special  consideration  at  the  endpoints  of  I.  The  derivative 
of  this  piecewise- linear  function  then  provides  a  piecewise-constant  representation  of  the 
nth  derivative,  that  can  be  used  as  before.  Knots  can  then  be  deduced  from  this  form 
as  above.  The  advantage  of  this  approach  is  that  it  can  be  iterated  [13].  If  the  process 
“converges” ,  the  result  can  be  used  to  provide  the  required  knot  set.  The  process  can 
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work  well,  but  is  capable  of  producing  disappointing  results.  Several  variants  of  the  basic 
concept  are  possible.  The  approach  warrants  careful  re-visiting. 

8.6  “Overall”  optimisation  approaches 

For  any  given  value  of  N,  the  problem  is  regarded  as  an  optimisation  problem  with 
respect  to  the  overall  error  measure.  It  is  necessary  to  provide  a  sensible  initial  estimate  of 
the  knot  positions.  Local  solutions  which  may  be  grossly  inferior  to  the  global  solution  are 
possible  [4].  At  an  optimal  solution,  knots  may  coalesce,  thus  reducing  the  continuity  of 
the  spline  at  such  points  [19];  the  same  comment  applies  to  the  sequential-knot-insertion 
and  optimisation  approach  (Section  8.7). 

8.7  Sequential  knot  insertion  and  optimisation 

Sequential  knot  insertion  with  optimisation  is  identical  to  the  sequential  knot-insertion 
strategy  (Section  8.3)  except  that,  after  each  knot  is  inserted,  all  previously-inserted 
knots  are  adjusted  such  that  the  complete  set  of  knots  at  that  stage  are  (locally)  optimal 
with  respect  to  the  overall  error  measure.  One  such  strategy  [15]  carries  out  the  optim¬ 
isation  at  each  stage  by  adjusting  in  turn  each  knot  in  the  current  knot  set  in  order  to 
achieve  satisfactory  reduction  in  the  i 2  norm,  and  repeating  the  complete  adjustment  as 
necessary.  This  strategy  is  not  as  poor  as  the  traditional  one-variable-at-a-time  strategy 
for  nonlinear  optimisation  because  knots  far  from  the  newly-inserted  knot  tend  to  have 
little  effect  on  the  error  measure. 

Buffering  to  prevent  knots  coalescing  and  reducing  the  continuity  of  the  approximant 
can  be  used.  Various  features  can  be  incorporated  to  improve  computational  efficiency, 
including  the  use  of  contemporary  nonlinear  least-squares  optimisation.  It  is  emphasised 
that  for  each  choice  of  knots  the  problem  is  linear  (cf.  Section  7). 

8.8  Optimal  discontinuous  piecewise-polynomial  approximation 

Consider  the  class  S n  of  splines  having  N  interior  knots  of  multiplicity  n  (i.e.,  nN 
interior  knots  in  all,  counting  coincidences).  An  $  €  Sjy  will  in  general  be  discontinuous 
at  these  knots.  It  is  possible  to  determine  the  globally  optimal  locations  of  such  knots, 
using  the  principle  of  dynamic  programming  [4].  The  approach  is  based  on  the  fact  that 
the  best  approximant  e  Sn  to  the  leading  p  (>  nN)  data  points  is  given  by  the  best 
over  q  =  nN  —  n  +  1,  nN  —  n  +  2, . . .  —  N  of  r-i  G  Sjv-i  for  the  leading  q  <p  —  N 

points,  together  with  a  polynomial  piece  of  order  n  over  points  q  -f  1  to  p.  By  this  simple 
recursive  means  the  globally  best  knots  for  splines  of  any  order  that  are  discontinuous 
at  any  number  of  knots  can  be  computed. 

Such  a  solution  may  not  be  suitable  as  the  final  result  in  an  application.  However, 
it  can  be  useful  as  part  of  a  knot  placement  strategy.  For  example,  suppose  good  knots 
for  a  spline  of  order  n  are  required.  An  approach  would  be  to  determine  an  optimal 
discontinuous  spline  of  order  n-f-  1.  Use  this  spline  to  estimate  /  in  expression  (8.1).  The 
integral  in  the  numerator  of  (8.1)  will  be  continuous  piecewise  linear  and  estimates  of 
the  optimal  knots  for  a  C^n“2^  spline  readily  obtained  from  (8.2).  Mixed  results  have 
informally  been  obtained  by  the  authors  with  an  implementation  of  this  approach.  It  is 
suggested  that  it  be  revisited. 
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8.9  Knot  dispersion 

A  set  of  knots  of  multiplicity  n  is  positioned  using  an  appropriate  strategy,  such  as 
that  in  Section  (8.8)  and  a  C^"1^/)  spline  with  these  knots  determined.  Each  of  these 
multiple  knots  is  “dispersed” ,  viz.,  replaced  by  n  nearby  simple  knots,  and  a  replacement 
£*(n— 2)  (j)  Spiine  computed.  A  careful  strategy  for  knot  dispersion  is  required.  Again, 
informal  experiments  have  been  made  by  the  authors  and  mixed  results  obtained. 

8.10  Knot  initialisation  and  candidate  knot  locations 

Several  of  the  above  procedures  require  or  can  benefit  from  an  initial  placement  of  the 
knots.  Some  make  use  of  “candidate  knot  locations” . 

The  solution  to  the  free-knot  spline  approximation  problem  returned  by  iterative 
algorithms  typically  depends  on  the  starting  set  of  knots.  Although  an  algorithm  may 
return  a  result  that  satisfies  the  necessary  and  sufficient  conditions  for  a  solution  [17] ,  this 
result  may  be  locally  rather  than  globally  optimal.  There  is  no  known  characterisation  of 
a  globally  optimal  solution.  The  careful  interpretation  of  solutions  is  therefore  important. 

The  use  of  candidate  knot  positions  can  be  helpful.  For  instance,  it  may  be  decided 
that  for  splines  of  even  order,  only  knots  that  coincide  with  data  abscissae  are  in  the 
candidate  set,  or,  for  splines  of  odd  order,  knots  only  at  points  mid-way  between  adja¬ 
cent  data  abscissae  may  be  so  regarded.  Such  criteria  are  consistent  with  the  choice  for 
interpolating  splines  and  the  generalisation  covered  in  Section  8.2.  The  Lyche-M0rken 
knot  removal  algorithms  [20]  use  data  abscissae  as  candidate  knots.  The  use  of  a  finite 
number  of  candidate  knot  locations  helps  to  reduce  the  dimensionality  of  the  problem: 
there  can  then  only  be  a  finite  number  of  possible  knot  sets.  For  large  N  this  number  can 
be  extremely  large,  making  it  prohibitive  to  examine  all  possibilities.  However,  for  small 
A,  e.g.,  1,  2  and  3,  it  may  indeed  be  possible,  and  can  pay  dividends.  Knot  insertion 
and  knot  removal  algorithms  can  also  implement  the  concept.  For  example,  at  each  stage 
of  a  knot  insertion  strategy,  two  or  three  knots  can  be  inserted  “simultaneously”.  By 
the  method  of  their  introduction  these  new  knots  will  be  optimal  relative  to  the  knots 
previously  used  and  the  available  candidate  knot  locations. 

Another  aspect  of  a  candidate  knot  set  is  that  if  it  is  sufficiently  dense  it  will  contain, 
to  a  degree  of  approximation  dictated  by  its  “spacing” ,  the  optimal  knots  for  the  given 
data  set  [19].  For  instance,  consider  a  set  of  m  100  data  points  specified  over  an  interval 
I  normalised  to  [—1, 1].  Take  100  uniformly  spaced  points  spanning  this  interval.  This  set 
will  contain,  to  approximately  two  figures,  each  globally  optimal  knot  set  having  N  <  98 
knots5  (assuming  all  knots  are  simple).  If  a  spline  based  on  these  98  candidate  interior 
knots  provided  a  valid  model,  a  suitable  knot  removal  algorithm  might  be  expected  to  be 
able  to  identify  reasonably  closely  the  optimal  knot  sets.  Work  is  required  to  determine 
the  degree  of  success  in  this  regard.  , 

9  Conclusions,  discussion  and  future  possibilities 

There  are  theoretical  difficulties  associated  with  existence,  uniqueness  and  characterisa¬ 
tion  of  best  free-knot  i 2  spline  approximants,  which  influence  practical  considerations. 

5 The  two  endpoints  do  not  constitute  interior  knots. 
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A  best  spline  in  the  class  of  splines  required  may  not  exist.  Take  as  {a;*}™,  m  —  21 
uniformly  spaced  values  in  [—1, 1]  and  yi  =  |.t7-|3.  To  see  that  a  best  £2  spline  s  of  order  4 
with  three  interior  knots  for  this  data  may  not  exist,  consider  the  choice  Aj  =  — c,  A2  =  0 
and  A3  —  e.  The  £2  error  can  be  made  smaller  than  any  given  6  >  0  for  some  e  >  0. 
However,  if  the  £2  error  is  made  zero  by  the  choice  e  =  0,  the  resulting  three  coincident 
knots  at  x  =  0  mean  that  s  has  lower  continuity  than  the  class  of  splines  considered.  In 
practice,  allowing  knots  to  come  “too  close”  together  can  introduce  undesirable  “sharp¬ 
ness”  into  the  approximant.  Buffering  of  knots  [15],  to  ensure  a  minimal  separation  helps 
in  this  regard.  The  use  of  a  candidate  knot  set  introduces  a  form  of  buffering.  In  some 
circumstances  the  coalescing  of  knots  would  be  ideal  in  terms  of  the  resulting  closeness 
of  s  to  the  data.  In  some  applications  the  loss  of  smoothness  would  be  unacceptable. 
Therefore,  whether  buffering  is  appropriate  depends  on  the  use  to  be  made  of  s. 

The  solution  may  not  be  unique.  Figure  6  shows  a  set  of  201  uniformly  spaced  points 
in  [-1,1]  taken  from  f(x)  =  sign (x )  min(:r,  1/2).  Figure  7  shows  the  root-mean-square 
residual  as  a  function  of  knot  location  for  £2  splines  of  order  4  with  one  interior  knot. 
There  are  two  best  approximants,  one  with  its  knot  at  x  =  —0.63  and  the  other  at 
x  =  H-0.63.  One  of  the  two  approximants  is  shown  in  Figure  6.  The  other  spline  is  its 
skew-symmetric  counterpart. 


Raw  data  and  spline  fit 


Fig.  6.  201  uniformly  spaced  points  in  [-1,1]  taken  from  f(x)  =  sign(;r)  min(x,  1/2) 
and  a  best  £2  spline  approximant  with  one  knot. 

It  is  rarely  required  to  determine  an  £2  spline  approximant  that  is  globally  or  even 
locally  optimal  with  respect  to  its  knots.  An  approximant  that  met  some  closeness  re¬ 
quirement  with  the  smallest  possible  number  of  knots  is  an  academic  rather  than  a 
pragmatic  objective.  Today,  the  more  important  consideration  is  to  obtain  an  approxim¬ 
ant  that  represents  the  data  in  that  its  smoothness  is  consistent  with  that  of  the  function 
underlying  the  data  and  the  uncertainties  in  the  data.  (This  statement  must  be  qual¬ 
ified  for  situations  where  the  continuity  class  of  splines  is  a  consideration  as  discussed 
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Fig.  7.  The  root-mean-square  residual  as  a  function  of  knot  location  for  i2  spline  ap- 
proximants  with  one  knot  to  the  data  of  Figure  6. 

above.)  These  ends  may  be  achieved  by  seeking  an  approximant  with  a  reasonable  but 
not  necessarily  optimal  number  of  knots. 

The  use  of  knot  removal  strategies  is  likely  to  attract  research  effort  in  the  future. 
One  reason  for  this  statement  is  that  the  need  to  work  with  large  initial  knot  sets  is 
not  as  computationally  prohibitive  with  today’s  powerful  personal  and  other  computers. 
Another  reason  is  that  the  approach  can  be  expected  to  produce  better  approximants, 
i.e.,  smaller  i2  errors  for  the  same  number  of  knots. 

The  two  sets  of  crosses  in  Figure  5  correspond  to  the  values  of  the  root-mean-square 
residual  as  a  function  of  the  number  of  knots  for  the  application  of  the  knot-insertion 
strategy  followed  by  the  knot-removal  strategy  for  the  thermophysical  data  of  Figure  3. 
The  two  sets,  where  the  “progress”  takes  place  from  left  to  right  along  the  “top  set”, 
followed  by  right  to  left  along  “the  bottom  set”,  constitutes  a  form  of  hysteresis.  The 
behaviour  in  the  two  directions  is  distinctly  different.  In  particular,  the  figure  indicates 
that  once  an  acceptable  approximation  has  been  obtained  by  knot  insertion,  the  use 
of  knot  removal  can  deliver  an  approximation  of  comparable  quality  with  many  fewer 
knots  or  alternatively  for  the  same  number  of  knots  an  appreciably  better  approximation 
can  be  obtained.  In  this  case,  with  30  knots,  knot  removal  gives  an  £2  error  that  is  one 
quarter  of  that  for  knot  insertion.  For  an  £2  error  of  0.005,  30  knots  are  required  using 
knot  removal  and  43  using  knot  insertion. 

Large  data  sets,  as  are  now  frequently  being  produced  in  metrology  from  computer- 
controlled  measuring  systems,  are  ideal  for  the  purpose  of  obtaining  a  sound  initial 
approximant  in  the  form  of  a  valid  model  containing  possibly  many  more  knots  than 
the  minimum  possible.  Their  size  permits  initial  approximants  to  be  obtained,  even  with 
large  numbers  of  uniformly  spaced  knots,  that  provide  valid  but  highly  redundant  models 
for  the  data.  The  fact  that  such  sets  do  not  contain  “appreciable  gaps” ,  because  of  the 
manner  in  which  they  gathered,  means  that  this  fact  together  with  the  quantity  of  data 
far  outweighing  this  initial  number  of  knots  goes  a  long  way  towards  ensuring  that  this 


344  M.  Cox,  P.  Harris  and  P.  Kenward 

initial  approximant  is  valid.  There  is  much  scope  for  an  appreciable  number  of  knots  to 
be  removed.  The  initial  large  number  of  knots  may  also  have  been  obtained  by  the  use 
of  a  knot  insertion  strategy.  It  is  the  experience  of  the  authors  that  knot  insertion  can 
introduce  appreciably  more  knots  than  given  by  the  optimal  choice. 

Because  the  early  approximants  may  be  far  from  optimal,  an  insertion  algorithm 
can  produce  knots  that  are  totally  different  from  those  in  an  optimal  approximant.  In 
contrast,  a  knot  removal  algorithm  has  a  possibility  to  obtain  good  knots.  (See  Section 
8.10.)  For  instance,  because  of  the  sequential  manner  in  which  knots  are  inserted,  there 
may  be  two  or  more  close  or  even  coincident  knots,  although  a  good  knot  set  might  not 
have  this  property.  It  is  also  possible  that  such  knots,  although  not  part  of  an  optimal 
set,  are  influential  in  their  effect  on  a  knot  removal  algorithm,  with  the  result  that  they 
appear  in  the  “final”  approximant. 

The  problem  of  data  containing  wild  points  is  not  addressed  satisfactorily  by  existing 
knot  placement  algorithms.  Because  such  points  are  responsible  for  a  large  contribution 
to  the  £2  error,  more  knots  would  be  placed  in  the  neighbourhood  of  such  a  point  than 
would  otherwise  had  been  done.  The  knot  placement  strategy  can  then  be  influenced 
more  by  the  errors  in  the  data  than  by  the  properties  of  the  underlying  function.  For¬ 
mulations  and  hence  algorithms  are  needed  that  have  greater  resilience  to  such  effects. 

In  solving  the  fixed- knot  spline  approximation  problem  as  part  of  the  free-knot  prob¬ 
lem,  a  knot  set  differs  from  a  previous  knot  set  only  by  the  addition  or  removal  of  a 
small  number  of  knots.  In  linear  algebraic  terms  the  “new”  matrix  A(A'),  say,  differs  in 
only  a  few  rows  from  the  previous  matrix  A(A).  Considerable  gains  in  computational 
efficiency  can  be  obtained  by  accounting  for  this  fact.  This  paper  has  not  addressed  this 
issue,  concentrating  more  on  the  concepts  in  the  area.  There  is  much  scope,  however, 
for  the  application  of  the  recognised  stable  updating  and  downdating  techniques  of  lin¬ 
ear  algebra  [17].  Their  application  will  not  reduce  the  computational  complexity  of  a 
procedure,  but  could  reduce  computation  times  for  large  problems  by  an  appreciable 
factor. 

The  work  described  here  was  supported  by  the  National  Measurement  System  Policy 
Unit  of  the  UK  Department  of  Trade  and  Industry  as  part  of  its  NMS  Software  Support 
for  Metrology  programme.  The  referee  provided  carefully  considered  comments  that 
permitted  the  paper  to  be  improved. 
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Abstract 

We  discuss  the  relationship  between  the  norm  of  the  local  discrete  least  squares  poly¬ 
nomial  approximation  operator,  the  minimal  singular  value  am\ n(-f=)  of  the  matrix  Ps 
of  the  evaluations  of  the  basis  polynomials,  and  the  norming  constant  of  the  set  of  data 
points  S  with  respect  to  the  space  of  polynomials.  Since  these  three  quantities  are  equi¬ 
valent  up  to  bounded  constants,  and  since  (^m\n(P=)  can  be  efficiently  computed,  it  is 
feasible  to  use  0"min(^)  95  a  tool  for  distinguishing  good  local  point  constellations,  which 
is  useful  for  scattered  data  fitting.  In  addition,  we  give  a  simple  new  proof  of  a  bound 
by  Reimer  for  the  norm  of  the  interpolation  operators  on  the  sphere  and  extend  it  to 
discrete  least  squares  operators. 

1  Introduction 

Let  H  be  a  bounded  domain  in  IT*,  d  >  1,  and  let  E  =  {£i, . . . ,  £m}  be  a  set  of  scattered 
points  in  Q.  Given  the  values  /|s  =  (/(£ i), . . . ,  /(£m))T of  an  otherwise  unknown  function 
/  :  Q  — ►  1R,  we  want  to  reconstruct  /  from  these  data.  The  least  squares  method  consists 
in  choosing  some  linear  independent  functions  pi, . . .  ,pn  on  Q,  n  <  m,  and  computing 
the  coefficients  ai,  *  •  • ,  an  €  IR  that  minimize  the  1 2  norm  of  the  residual  on  S, 

m  2/2 

11/1=  -  p|s||2  =  ( H  i/(eo  -  , 

2  =  1  t 

with  p  =  aipi  H - h  anpn  e  V  :=  span{pi, . . .  ,pn}.  Let  V\~  :=  span{pi|s, . . .  ,pn Is}- 

If  dimP|=:  =  n ,  then  the  least  squares  solution  is  unique,  and  we  denote  it  by  L-p^f. 
Note  that  the  minimum  norm  solution  available  in  the  case  of  a  rank  deficient  problem 
(dimP|;=  <  n)  seems  less  useful  since  in  general  it  does  not  reproduce  the  elements  of  V 
exactly. 

The  computation  of  least  squares  approximation  of  /  is  expensive  if  m  and 

n  are  large.  To  obtain  a  scattered  data  fitting  algorithm  with  linear  complexity  with 
respect  to  the  size  of  data,  a  two-stage  method  [8]  can  be  employed  which  consists  in 
1)  covering  the  original  domain  Q  with  a  number  of  subdomains  Q*  each  containing 
only  a  small  subset  =  S  D  Qk  of  S,  computing  local  approximations  to  the  data  in 
Sfc,  and  2)  using  the  information  obtained  from  these  local  approximations  to  build  the 
final  approximation  of  the  (possibly  huge)  original  data  set.  The  least  squares  method 
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can  be  employed  in  the  local  approximation  stage,  especially  to  deal  with  “real  world” 
data  usually  contaminated  with  errors  or  just  containing  undesirable  “high  frequency” 
components. 

If  V  is  chosen  to  be  the  space  11^  of  algebraic  polynomials  in  d  variables  of  a  suitable 
degree  q ,  then  n  —  (d^).  To  achieve  high  approximation  order,  it  is  desirable  to  choose 
q  such  that  n  is  only  a  little  smaller  then  m ,  However,  this  is  not  always  possible  due  to 
the  rank  deficiency  or  ill-conditioning  of  the  least  squares  problem,  which  is  especially 
difficult  to  control  if  £1, . . .  ,£m  £  S*,  are  unevenly  distributed  in  This  difficulty  can 
in  principle  be  overcome  by  constructing,  for  each  S*,  a  suitable  subspace  of  higher 
degree  polynomials  (least  interpolation  space  [2]).  If,  however,  the  polynomial  degree 
is  not  allowed  to  exceed  a  fixed  small  value,  then  a  common  practical  approach  is  to 
choose  larger  sets  S&  C  E,  with  m  substantially  greater  than  n,  see  e.g.  [4]  where  it  is 
suggested  to  use  for  local  least  squares  approximation  m  =  11  points  if  V  =  III  with 
n  =  6  and  m  =  15  points  if  V  =  n§  with  n  =  10.  However,  even  these  higher  m  provide 
no  guaranty  that  the  matrix 

Psk  :=bj(&)  :  *  =  j  =  1, . . .  ,n] 

of  the  local  least  squares  problem  will  always  be  well-conditioned.  Moreover,  for  some 
data,  this  method  may  lead  to  the  use  of  inappropriately  distant  points  for  the  local 
approximation. 

The  purpose  of  this  paper  is  to  draw  attention  to  the  fact  that  the  conditioning  of 
the  matrix  P-~k  is  not  only  the  issue  of  numerical  stability  of  the  computation  of  least 
squares.  Indeed,  the  reciprocal  of  the  minimal  singular  value  vmin(PE)  of  Ph  provides 
a  bound  for  the  norm  of  the  least  squares  operator  if  both  m  and  n  are  small. 
Therefore,  the  approximation  power  of  local  least  squares  depends  on  crmin (Ps.)  and 
the  best  approximation  from  V.  Since  crmin(Ps)  can  be  efficiently  computed  for  a  small 
matrix  Ps  by  well  known  numerical  algorithms,  it  is  feasible  to  use  it  as  a  tool  to 
decide  whether  a  particular  portion  of  data  is  suitable  for  building  local  least  squares 
approximation  from  V  with  reasonable  approximation  power.  If  crmin(Ps)  is  too  small, 
then  either  S  or  V  should  be  modified,  e.g .  by  adding  more  points  to  S  or  using  an 
appropriate  subspace  of  V.  A  two-stage  algorithm  for  fitting  large  irregularly  distributed 
scattered  data  sets  employing  the  conditioning  of  the  local  observation  matrices  P=fc  is 
studied  in  [3,  5]. 

The  paper  is  organized  as  follows.  In  Section  2  we  discuss  the  relationship  between 
the  norm  of  the  discrete  least  squares  approximation  operator,  the  minimal  singular 
value  <rm in(Ps),  and  the  norming  constant  i/(P,E).  As  a  by-product,  we  obtain  a  new 
proof  of  a  known  bound  for  the  norm  of  the  interpolation  operators  on  the  sphere  [7], 
and  extend  it  to  the  discrete  least  squares  operators.  Section  3  illustrates  the  above 
concepts  in  the  univariate  case,  when  they  are  also  related  to  the  separation  distance  of 
E,  while  Section  4  is  devoted  to  a  discussion  of  the  least  squares  multivariate  polynomial 
approximation. 
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2  Bounds  for  ||Lp|S||  and  approximation  error 

Let  pi , . . . ,  pn  be  linearly  independent  continuous  functions  onQ  cHd  spanning  a  linear 
space  V.  Since  all  norms  on  a  finite  dimensional  linear  space  are  equivalent,  there  are 
positive  constants  K\ ,  such  that 

n 

*iMa<  Y,am  <^alHla  (2-1) 


for  any  coefficient  vector  a  =  (ax, . . . ,  an)T  6  Rn. 

Given  S  =  {£1, . . .  ,£m}  C  ft,  we  consider  the  matrix  P=  €  3Rmxn  as  defined  in  the 
introduction.  Obviously,  rank  P-=  —  dim  V\e-  If  Pe  has  full  rank,  then  dim  P|=  =  n, 
and  the  least  squares  approximation  Lp  =  f  is  uniquely  determined,  giving  rise  to  the 
operator  LV}E  :  C'(ft)  ->Vc  C(ft). 

It  is  easy  to  see  that  Lp\E  exactly  reproduces  the  elements  of  P,  ie., 

Lv,eP  =  P,  all  p  £  V.  (2.2) 

Therefore,  a  standard  argument  shows  that 

II/-  -&7>,s/||c(0)  <  (!  +  \\Lv*\\)E(f,V)c{n),  (2-3) 

where  E(f ,  V)c(Q)  denotes  the  error  of  the  best  approximation  of  /  from  V  in  Chebyshev 
norm, 

E{f,V)C(n)  :=  inf  ||/  -p||c(n)- 

pEV 

Thus,  an  estimate  for  ||Z/-pjs ||  immediately  gives  an  upper  bound  for  ||/  —  Lp,s/||c(ft)- 
The  norming  constant  i/(P,E)  of  S  with  respect  to  V  [6]  can  be  defined  by 

u{V,Z)  =  min||p|=||0O/||p||C(n)-  (2.4) 

pE  r 

Given  any  matrix  A,  we  denote  by  crmin(yl)  the  minimal  singular  value 

ffmin(^)=  min  ||Ae||2. 

11*112  =  1 

Recall  that  if  A  has  full  rank,  then  crmin(^4)  =  Jl-A+Hj1,  where  A+  is  the  pseudoinverse 
of  A,  see  e.g.  [1]. 

Theorem  2.1  If  rank  P=  =  n,  then 

Ki/amin(Pz)  <  \\Lv ,h||  <  ^v^Mnin^s),  (2-5) 

l/u(V,E)<  \\LV*\\  <VmMP,  E),  (2.6) 

Kiv (P,E)<  amin(PH)  <K2^v(P,~).  (2.7) 

Proof:  We  first  prove  (2.5).  Let  Lp,Ef  =  1  ajPj •  It  follows  by  a  well-known  result 

in  numerical  linear  algebra  that  the  vector  a  =  (ax, . . .  ,an)r  can  be  computed  as  the 
product  of  the  pseudoinverse  Pj“  of  PE  with  the  vector  /|=.  Therefore, 

ll«lb  =  ll^/lslb  <  II^IUN/Islb  = 


Approximation  power  of  local  least  squares 

Since  ||Lp=/||c(fi)  ^  K2NI2  and  ||/|s||2  <  v'mH/lHHoo  <  s/rri ||/||c(ft),  the  upper 
bound  in  (2.5)  follows.  To  prove  the  lower  bound  in  (2.5),  we  choose  a  function  /  e  (7(H) 
such  that 

mfkh  =  I|pb+II2||/IeI|2,  II/IsIU  =  WfWcm, 

which  is  obviously  possible.  Then  by  (2.1)  we  have 

,  \\Lv,~f\\cm  >K1\\P+f\s\\2  =  K1a^n(Ps)\\f\E\\2, 

which  implies  the  desired  lower  bound  since  ||/|s||2  >  ||/|e||oo  =  ||/||c(^)- 

Since  11^^^/110(17)  <  ^_1(P,  E)||(Lp  ^/^slloo,  the  upper  bound  in  (2.6)  follows  by 

\\(Lv*fM\°°  <  ||(^/)|s||a  <  Wfkh  <  Vm/jlc(n). 

To  prove  the  lower  bound,  we  denote  by  p  an  element  of  V  for  which  the  minimum  in  (2.4) 
is  attained  and  choose  a  function  /  <E  C(ft)  such  that  f\s  =  p\s  and  ||/||c(n)  =  ll/l-lloo* 
Then  by  (2.2), 

WLvzfWan)  =  M\c<m  =  ^1(PJs)||p|=||00  =  v-\p,s)\\f\\cm, 
which  implies  HLp^H  > 

We  finally  establish  (2.7).  For  any  p  e  V ,  let  p  =  Yjj=i  ajPj  anci  a  =  (au  •  *  •  >  an)T • 
Then  p\s  —  Peu  and  hence 

Iblslloo  <  ll^feolb  <  \Mlblslloo. 

Since 

ffmin(-Ps)  =  min  HPsalh/IMh, 

(2.7)  follows  by  (2.1).  □ 

In  view  of  (2.3),  the  upper  bound  in  (2.5)  implies 

ll/-^3/llc(fi)<(l  +  ^2^/^min(PE))£(/,P)c(n),  (2.8) 

which  shows  that  the  approximation  power  of  discrete  least  squares  proportionally  re¬ 
duces  if  (Tniin(-Fk)  (or  u(V,E))  is  small.  We  will  discuss  some  practical  consequences  of 
this  fact  in  the  next  two  sections. 

Although  v(V,E)  gives  tighter  bounds  for  ||/p,s||,  <rmin(Ps)  has  a  clear  practical 
advantage  that  it  is  easily  computable  by  using  e.g.  the  singular  value  decomposition  of 
the  small  “local”  matrix  P*.  On  the  other  hand,  the  norming  constants  were  used  in  [6,  9] 
to  derive  estimates  for  the  approximation  error  of  radial  basis  function  interpolation  and 
moving  least  squares,  respectively. 

Remark  2.2  If  pi,..  .  ,pn  is  an  orthonormal  basis  for  V ,  then  ||o||2  =  ||p||l2(q),  p  = 
jyj=iajPj>  an-d  the  constants  K\,K2  in  (2.1)  are  closely  related  to  Nikolskii  constants 
of  the  space  V ,  namely, 

Ki=N2~X(V),  K2  =  Noc,2(V), 

where 

n<ixm{'P)  ■=  max ||p||L,l(n)/||p|U  :  1  <  91,92  <oo. 
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In  particular,  if  Q  =  the  unit  sphere  in  IRrf,  and  {pi, . . .  ,pn}  is  the  set  of  spherical 
harmonics  forming  an  orthonormal  basis  for  the  space  V  —  Hq  of  spherical  polynomials 
of  degree  q  in  d  variables,  then  it  is  not  difficult  to  prove  that  K2  —  N00£{'Hq)  — 
y/n/\Sd~1\y  where  denotes  the  surface  area  of  S rf”1.  Therefore,  for  any  set  E  C 

Sd~x  with  #S  =  m  >  n,  we  have  by  (2.5), 

11^=11  <  sJnm/\S*-mam[n(PE),  (2.9) 

which  recovers  in  the  case  of  interpolation  (m  =  n)  an  error  bound  by  Reimer  [7] 
originally  proved  by  using  Lagrangian  square  sums  (see  also  [10]). 


3  Univariate  polynomials 

Let  Q  be  an  interval  [—h,h]  on  the  real  line  IR,  and  let 

Pj(t)  =  (t/hy-\  j  = 

Then  V  is  the  restriction  to  [~h,h]  of  the  space  II^_1  of  all  univariate  polynomials 
of  degree  at  most  n—  1.  By  the  well-known  interpolation  properties  of  the  univariate 
polynomials,  rankPs  =  n  for  any  S  =  {£i, . . .  ,£m}  C  [—h,h],  m  >  n,  with  distinct  £,:’s. 
For  any  S'  =  , . . . ,  }  C  E,  let  qs>  denote  the  separation  distance , 

The  Lebesgue  constant  ||Lp=/||  of  the  corresponding  interpolation  scheme  can  be  easily 
estimated  as 

II V,2'll  < 

Since  S'  may  be  any  subset  of  E  of  cardinality  n  and  since  v(V,  E)  >  we  get 


>n—  1 


y-\v,=.)  < 


(h/qs,  n)n~\ 


where 


q~  n  :=  max  . 

’  E'CE 

#H'=n 


Hence,  by  (2.3)  and  (2.6), 

11/  -  Ln._^/||ci-Ml  <  (l  +  r(h/qs,n)n-^  E(/,K-i)ci-hm 


(3.1) 


This  last  estimate  shows  that  the  univariate  least  squares  polynomials  have  the 
approximation  power  of  the  best  local  polynomial  approximation  as  h  — >  0  provided 
h/q^n  remains  bounded.  However,  if  the  scattered  points  £i, . . . ,  €  [~h,h]  are 

clustered  together  in  at  most  n  -  1  very  tight  groups,  then  g=,n  may  be  arbitrarily 
small,  thus  forcing  the  right  hand  side  of  (3.1)  to  blow  up.  To  figure  out  what  happens 
to  ||/  - Xni  s/||c[-M]  *n  these  circumstances,  we  consider  the  following  example. 
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Let  h  =  1,  n  =  2,  f(t)  —  t2-  1/2,  and  B  =  {-£,0,£}  for  some  0  <  £  <  1.  It  is  easy 
to  see  that  L^i^f  =  —1/2  +  2f2/3-  Since  E(f,IL\)C[-i,i]  =  1/2,  we  have 

11/  -  Ln^fWa-i,!]  =  1/2  +  |l/2  -  2$2/3|  <  2E(f,H\)c[-i,i] 

even  though,  by  a  simple  calculation, 

ll^ni.sll  =  1/3  + 

fc(ft)  =  lMP,S)  =  l/fe,2  =  1/1-00  as  £-0. 

This  may  contribute  to  the  opinion  that  ||Lni jS||,  crmin(Ps),  v(V,E)  and  q^n  are  not 
the  right  quantities  to  describe  the  behaviour  of  the  approximation.  Indeed,  as  the 
three  points  — f,0,f  coalesce,  Lui  Ef  converges  to  a  Hermite  interpolation  polynomial 
provided  the  entries  of  P-~  as  well  as  the  values  of  /|s  are  exact.  However,  if  we  simu¬ 
late  “real  world”  data  by  adding  to  /(— £),/(0),/(£)  normally  distributed  errors  with 
standard  deviation  10~4,  then  the  picture  substantially  changes.  Table  1  shows  that 
||/  -  Lni?s/||c[-i,i]  does  blow  up  in  this  case.  For  comparison  we  also  include  in  the 
table  the  error  of  ||/  —  Lui  Ef\\C[-i,i]  for  the  same  contaminated  data. 

Tab.  1  Average  (d^ an)  and  maximum  (dJnax)  of  11/  -  £n},s/llc[-i,i]  as  well  as 
maximum  of  ||/  -  Lni,s/||c[-i,i]  (4nax)  >n  1000  tests  with  contaminated  data 


e 

10-3 

10-4 

10-B 

1(T8 

10-7 

1(T8 

’d1 - " 

u'mean 

1.06 

1.56 

6.63 

57.3 

564 

5630 

d1 

umax 

1.24 

3.39 

24.9 

240 

2390 

23900 

d° 

umax 

1.00018 

1.00018 

1.00018 

1.00018 

1.00018 

1.00018 

Thus,  if  gs,n  is  too  small,  we  cannot  practically  achieve  with  least  squares  the  ap¬ 
proximation  order  of  E(f ,  ^^i)c[-hth]  simply  because  the  points  lying  too  close  to  each 
other  carry  redundant  information  and  we  have  at  most  n  —  1  clusters  of  such  points. 
Therefore,  we  should  adjust  the  polynomial  degree  to  the  given  data  paying  attention  to 
the  trade-off  between  higher  approximation  power  of  higher  degree  polynomials  and  the 
“pollution”  caused  by  the  factor  qZln  that  increases  with  n.  In  practice  one  may  choose 
maximal  n  such  that  h/q^^n  is  smaller  than  a  prescribed  tolerance  value  0  <  E  <  oo. 

4  Multivariate  polynomials 

The  situation  becomes  substantially  more  complicated  when  we  turn  to  multivariate 
polynomials.  Let  O  be  a  bounded  domain  in  and  let  { p\ , . . .  ,pn},  n  =  (d~^q),  be 
a  basis  of  the  space  V  =  11^  of  polynomials  in  d  variables  of  total  degree  q  satisfying 
(2.1)  on  £L  (For  example,  we  may  consider  a  properly  scaled  standard  power  basis  with 
the  center  at  a  point  in  Q,  or  the  Bernstein-Bezier  basis  with  respect  to  some  simplex 
overlapping  Q  or  a  significant  part  of  it.)  Let,  furthermore,  B  be  an  arbitrary  finite  set 
of  points  in  0  such  that  m  =  #B  >  n. 

The  first  problem  we  face  in  the  case  d  >  2  is  that  the  matrix  Ph  may  be  rank  deficient. 
It  is  clear,  however,  that  there  is  no  practical  difference  between  this  situation  and  the 
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one  when  Ps  has  full  rank  but  is  extremely  ill-conditioned,  i.e.,  crmjn(PE)  is  very  small. 
Moreover,  (2.8)  shows  that  even  moderately  small  <7mjn(Ps)  niay  significantly  reduce 
the  approximation  power  of  Lp^  Clearly,  the  same  can  also  happen  in  the  univariate 
case  if  qz,n  is  too  small.  The  real  difficulty  of  the  multivariate  case  seems  to  be  that 
simple  characteristics  of  E,  like  separation  distance  qs:m  do  not  give  much  information 
about  the  norm  of  Lp  =.  For  example,  six  equidistant  points  on  the  unit  circle  in  1R2 
are  well  separated  and  look  reasonably  distributed.  However,  they  are  not  good  for  least 
squares  approximation  from  the  space  n2  since  the  matrix  P~  is  rank  deficient.  Suitably 
perturbed,  these  points  will  give  rise  to  the  least  squares  operator  Ln 2  =  with  a  very 

large  norm.  More  generally,  the  norm  of  LUd*  will  be  large  if  the  points  in  E  C  lRrf  lie 
“too  close”  to  an  algebraic  hypersurface  of  order  q. 

If  the  data  are  comparatively  dense  in  ft,  namely  the  fill  distance 

hs,n  :=  sup  min  \x  -  £| 
xeCitz- 

does  not  exceed  some  small  positive  constant  depending  on  0  and  the  polynomial  degree, 
then  the  estimates  of  the  norming  constant  ,  E)  given  in  [9]  provide  a  bound  for 
||Lnd|n,sll?  in  view  of  (2.6).  For  example,  if  ft  is  a  ball  of  radius  r,  then  j/(n^|o,H)  >1/2 
if  hz,n  <  0.11  r/q2. 

On  the  other  hand,  without  any  density  assumptions  we  can  always  rely  on  (2.8), 
where  crmin(Ps)  can  be  efficiently  computed  by  well  known  algorithms  of  numerical  lin¬ 
ear  algebra.  In  some  sense,  small  (Tm[n(P~ )  indicates  that  the  local  data  has  “hidden 
redundancies”  ( e.g .  too  many  points  lying  very  close  to  the  same  straight  line  or  the 
same  ellipse)  that  prevent  it  from  carrying  enough  information  for  a  “full  power”  ap¬ 
proximation  of  the  underlying  function  from  n^.  Similar  to  the  univariate  case,  but  using 
crm[n(Ps)  instead  of  qz,m  we  can  adaptively  choose  the  polynomial  degree  according  to 
the  following  algorithm  that  has  proven  to  be  useful  for  scattered  data  fitting  [3,  5]. 

Let  ft  C  S  C  ft,  #E  =  m.  Denote  by  PJ  the  matrix  of  the  evaluations  of 
appropriate  basis  functions  for  n^,  q  >  0,  at  the  points  £  €  E. 

Algorithm  4.1  Starting  with  some  q  =  qo  >  0  such  that  (d~^g)  <  m,  compute  a min(P|)  • 
If  l/crmi n(P|)  is  smaller  than  a  prescribed  tolerance  E  <  oc,  then  compute  the  least 
squares  approximation  to  the  data  in  E  and  accept  it  as  a  reliable  approximation  on 
D.  Otherwise,  repeat  the  same  with  q  =  qo  —  1  and  successively  redxice  the  degree  q  to 
qo  —  2, . . . ,  0,  while  l / crm[n(P§)  >  E.  For  q  —  0  no  comparison  ofl/a  inin^s)  with  E  is 
needed  since  ||Z/nd|n,sll  *■ 5  bounded  for  any  ft  and  E. 

Note  that,  optionally,  the  condition  number  \\Pi\\2/crm\n(P§)  of  Pj  can  be  used  in 
the  above  algorithm  instead  of  l/crm[ n(P|),  as  it  has  been  formulated  in  [5]. 
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Abstract 

In  recent  years  application  of  a  discrete  wavelet  transform  (DWT)  has  become  an  estab¬ 
lished  tool  for  the  design  of  preconditioners  for  smooth,  dense  matrices,  such  as  those 
that  arise  in  the  solution  of  certain  integral  equations.  In  this  paper  we  consider  the 
higher  dimensional  case,  where  the  matrix  A  is  not  itself  smooth,  but  has  a  smooth  block 
structure.  To  precondition  such  matrices,  we  use  repeated  application  of  a  level  1  block- 
wise  DWT  to  exploit  the  fact  that  corresponding  entries  in  adjacent  blocks  are  close  in 
value.  We  illustrate  the  effectiveness  of  our  methods  by  means  of  numerical  examples. 

1  Introduction 

We  have  previously  ([9])  considered  wavelet-based  preconditioning  methods  for  dense 
matrices  having  the  property  that  the  entries  vary  smoothly  (that  is  to  say,  adjacent 
entries  are  close  in  value)  apart  from  known  areas  of  singularity,  for  example  a  non¬ 
smooth  diagonal  band.  The  main  idea  is  to  use  wavelet  compression  (see,  for  example 
[14])  to  convert  “smoothness”  in  the  original  matrix  into  “smallness”  in  the  transformed 
matrix,  and  then  to  approximate  the  transformed  matrix  by  dropping  small  entries. 
Smooth  matrices  arise  in  a  range  of  applications  (see,  for  example,  [6,  8,  10])  involving 
an  essentially  1-dimensional  discretization  process.  In  higher  dimension  cases  the  corres¬ 
ponding  matrices  have  a  block  structure:  each  block  is  smooth  and  corresponding  entries 
in  different  blocks  vary  smoothly;  but  discontinuities  at  the  edges  of  the  blocks  mean  that 
standard  application  of  DWT  does  not  give  good  compression.  In  this  paper  we  extend 
the  ideas  of  [9]  to  enable  preconditioners  to  be  designed  for  such  matrices.  Throughout 
we  use  Daubechies  wavelets,  which  are  orthogonal  and  have  compact  support. 

2  DWT-based  preconditioners 

We  are  interested  in  fast  solution  of  linear  systems 

Ax  =  b,  x,b  €  Cn,  A  6  Cnxn,  (2.1) 

where  A  is  a  large,  dense  matrix.  Krylov  subspace  iterative  methods,  such  as  GMRES 
(described  in  [13]),  can  be  used  to  solve  (2.1),  but  in  most  cases  preconditioning  is 
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required  in  order  to  obtain  good  convergence.  One  method  of  preconditioning  is  to  seek 
a  matrix  M  &  A  such  that  M~lv  can  be  calculated  cheaply  for  any  vector  v.  For  smooth 
dense  A  the  task  is  usually  made  easier  by  transforming  (2.1)  into  a  wavelet  basis  (see 
e.g.  [4,  5,  6,  10,  11]).  When  a  DWT  is  applied  to  such  an  A,  the  resulting  matrix  A  has 
many  small  elements.  A  sparse  A  «  A  can  be  obtained  by  setting  to  zero  small  elements. 
This  is  the  main  idea  underlying  most  wavelet-based  preconditioners. 

2.1  Preconditioners  for  1-D  problems 

Typically  A  is  smooth  apart  from  a  narrow  diagonal  band.  When  a  level  k  standard  DWT 
is  applied  A  has  a  ‘finger’  pattern  of  large  entries  (caused  by  the  non-smooth  diagonal 
feature)  and  an  n/2k  x  n/2k  block  of  large  entries  at  the  top-left  corner.  Here  n  should 
be  a  power  of  2.  We  can  form  a  preconditioner  M  »  A  by  setting  to  zero  entries  that  fall 
below  some  chosen  threshold,  but,  because  of  the  finger  pattern,  a  large  amount  of  fill-in 
occurs  under  LU  factorization.  To  avoid  this  problem  M  can  be  obtained  by  setting  to 
zero  entries  in  A  that  fall  outside  of  a  diagonal  band.  We  describe  this  approach  as  a 
“band  cut”. 

The  finger  pattern  can  be  avoided  by  using  DWTPer  (DWT  with  permutations,  first 
proposed  in  [6],  see  also  [7]),  which  centres  the  fingers  to  form  a  sparse  diagonal  band 
whose  width  can  be  predicted  accurately.  M  can  then  be  formed  by  applying  a  band  cut 
to  A  and  (optionally)  imposing  a  threshold. 

An  alternative  way  of  avoiding  the  creation  of  a  finger  pattern  matrix  is  to  use  the 
Non-Standard-forms  (NS-forms)  of  Beylkin,  Coifman  and  Rokhlin  (see  [3])  to  represent 
A  in  terms  of  the  blocks  of  a  larger  matrix.  In  [9]  we  presented  a  new  way  of  using  the 
NS-form  submatrices  to  precondition  A  based  on  the  Schur  complement  and  recursive 
application  of  a  flexible  GMRES  iteration.  We  compared  four  alternative  DWT-based 
preconditioning  methods: 

PI  standard  DWT  preconditioner  with  band  cut  ([5]), 

P2  DWTPer  preconditioner  with  band  cut  ([6,  10]), 

P3  NS-form  preconditioner  with  threshold  ([3,  11]), 

P4  Recursive  Schur  complement  preconditioner  ([9]), 

and  found  that,  for  smooth  matrices  with  a  diagonal  singularity,  P4  gave  consistently 
good  performance,  PI  performed  well  for  moderate  singularities  and  P2  was  best  when 
the  diagonal  singularity  was  very  pronounced.  When  we  came  to  consider  2-D  problems, 
the  robustness  of  P4  encouraged  us  to  consider  ways  of  extending  it  to  higher  dimensions. 

2.2  Extension  to  matrices  with  block  structure 

In  the  2-dimensional  case  we  are  concerned  with  matrices  that  have  a  smooth  block 
structure.  We  can  compress  dense  block  matrices  of  this  type  using  two  different  types 
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of  DWT:  The  block  DWT  has  a  transform  matrix  of  the  form 

/  W<«)  0  •  ••  0  \ 

0  W<">  : 


W{Jn'n)  =  lrn  ®  '  = 


V  o 


W(”)  J 


(2.2) 


where  W ^  is  a  standard  nxn  DWT  matrix  and  0  is  the  nxn  zero  matrix.  It  exploits 
smoothness  within  blocks.  The  Big  Block  DWT  (BBDWT)  exploits  smoothness 
between  blocks.  It  has  a  transform  matrix  of  the  form 


W(BmBn)  =  W{m)®In 


/  hoi 

frl/ 

ho-il 

0 

. 

0  \ 

0 

0 

hoi 

h\I 

ho~\I  0 

0 

h2I 

hD-\I 

0 

0  h0I 

hil 

goi 

9i  I 

gn-il 

0 

0 

0 

0 

goi 

9il 

9D-iI  o 

0 

\  921 

9d-iI 

0 

* . 

0  goi 

9il  / 

where  ho,.. .  ,ho- i  and  go, . . .  ,<?£>- l  are  the  low-pass  and  high-pass  filter  coefficients 
respectively  ( D  being  the  order  of  the  wavelet  transform),  /  is  the  nxn  identity  matrix 
and  0  is  the  nxn  zero  matrix.  The  resulting  transformed  matrix  has  a  ‘finger’  structure  of 
blocks,  each  with  a  diagonal  structure.  We  can  avoid  the  finger  pattern  by  permuting  the 
rows  and  columns  of  the  transformed  matrix  so  as  to  centre  the  blocks  containing  large 
entries.  We  call  this  modified  big  block  transform  BBDWTPer,  because  it  is  a  big  block 
version  of  the  DWTPer  transform  described  in  [10].  We  anticipate  that  BBDWTPer 
may  be  useful  for  preconditioning  block  matrices  with  a  very  strong  block  diagonal 
singularity  (see  the  comparison  of  DWTPer  and  other  DWT-based  preconditioners  in 
[9]),  but  we  have  not  yet  found  example  matrices  for  which  BBDWTPer  provides  a  good 
preconditioner.  Preconditioners  based  on  BBDWT  and  BBDWTPer  are  tested  in  Section 
4;  we  now  present  a  more  effective  method. 

3  Recursive  BBDWT-based  preconditioning 

An  alternative  way  of  avoiding  the  ‘finger’  pattern  is  to  use  a  ‘Big  Block’  version  of  the 
NS-forms  presented  in  [3].  We  define  the  Big  Block  NS-form  (BBNS-form)  of  a  matrix 
as  follows.  To  transform  a  matrix  consisting  of  m2  blocks,  each  of  dimension  n  (where  m 
and  n  are  powers  of  2)  we  define  P*,  Qj  to  be  the  mn/2l  x  mn/21 “1  matrices  such  that 


W 


DB 


(3.1) 
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Given  an  mn  x  mn  matrix  A,  define  T0  =  A, 

Ti  =  PiTi-1PiT,  Ai  =  QiTi^1QiT,  Bi  =  Q  =  PiTi-1QiT,  (3.2) 

*-(&'  t«)'  <33) 

The  level  k  BBNS-form  of  A  comprises  Tk  together  with  Ai,B{yCi ,  z  =  1, 2, . . .  ,  A;.  (The 
blocks  of  Ti  are  arranged  differently  from  those  of  the  standard  level  1  DWT  of  T We 
have  used  this  ordering  in  order  to  be  consistent  with  the  notation  of  [3].) 

We  propose  to  use  banded  approximations  to  the  submatrices  of  the  BBNS-form  as 
the  basis  for  our  preconditioner.  If  the  blocks  of  A  vary  smoothly  apart  from  a  diagonal 
block  band,  then  each  of  Bi+ i,  Ci+ 1  will  have  small  entries  except  for  a  wrap- 

round  diagonal  block  band.  So  we  can  approximate  them  by  A{+ i,  Bi+ i,  Ci+ 1,  formed 
by  cutting  to  a  block  band,  giving  an  approximation  Ti  to 

!::)•  (3-4) 

(In  practice,  it  is  unnecessary  to  compute  Ai+i,  Bi+ 1,  Ci+ 1  and  then  to  set  entries  outside 
the  block  band  to  zero;  instead  we  can  compute  only  the  non-zero  entries  of  Ai+ 1 ,  Bi+i , 
Ci+i.  This  enables  us  to  reduce  the  cost  of  forming  Tj.) 

We  now  show  how  this  can  help  us  to  solve  (2.1).  We  use  a  flexible  GMRES  iteration 
(see  [12])  preconditioned  by  approximate  solution  of  an  equation  of  the  form  Ay  =  v  at 
each  step.  To  do  this  we  first  apply  a  level  1  BBDWT  with  a  block  band  cut  to  give 

(4‘  f  )(£Hl).  <3« 

where  yx  =  Qiy,  y2  =  Piy7_vi  =  Qxv,  V2  =  Pxv.  We  solve  this  equation  using  the  Schur 
complement  S\  =  Tx  —  C\A^lBx,  This  requires  us  to  solve  an  equation  of  the  form 

Sm  =  w2,  .  (3.6) 

which  we  do  by  a  further  GMRES  iteration.  We  expect  that  Tx  will  be  an  effective 
preconditioner  for  S\  (see  [1]§9.3),  so  we  now  seek  a  cheap  way  of  applying  T-f1  to  a 
vector.  To  do  this  we  repeat  the  process  of  applying  a  level  1  BBDWT  and  using  the 
Schur  complement.  In  summary,  during  the  solution  of  (3.6)  we  solve  a  preconditioning 
equation  of  the  form 

Tiy  =  v,  y,veCmn/2.  ■  (3.7) 

To  do  this  cheaply  we  repeat  the  process  of  applying  a  level  1  BBDWT  and  using  the 
Schur  complement  and  obtain  an  equation  of  the  form 

S2z  =  w,  z,we  Cmn/i.  (3.8) 


This  in  turn  can  be  solved  using  flexible  GMRES  preconditioned  by  T2.  We  continue 
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recursively,  solving  equations  of  the  form 


SiZ  =  w,  Z,we  Cmn/2‘ 


iteratively,  preconditioning  by  solving  equations  of  the  form 

TiV  =  v,  y,veCmn/2\ 


(3.10) 


until  the  matrix  T)  is  small  enough  that  Tf1  can  be  applied  directly  by  means  of  LU 
factorization  at  low  cost.  Therefore,  at  level  i,  each  GMRES  iteration  requires  a  pre¬ 
conditioning  step  that  in  turn  calls  for  iterative  solution  by  GMRES  of  a  coarser  level 
equation.  At  the  coarsest  level  the  preconditioner  is  applied  directly  using  an  LU  factor¬ 
ization  of  Ti+\.  This  process  is  summarized  in  Algorithm  3.1. 

Algorithm  3.1  Approximate  solution  of  Tip  =  v. 

(1)  Compute  v\  =  Qi+iV,  v2  —  iv. 

(2)  Solve  Ai+\W\  =  V\  for  W\. 

(3)  Set  w2  =  v2-  Ci+iwi. 

(4)  Define  Si+i  —  Ti+i  -  Q+iA^Bi+i. 

(5)  Solve  iy2  =  w2  for  y2  by  flexible  GMRES  iteration,  preconditioning  with  Ti+ 1, 
using  Algorithm  3.1  if  i  +  1  <  A:  and  using  matrices  Ui+i  otherwise. 

(6)  Set  jji  =  u>i  -  A^Bi+ifo. 


(7)  Set  y  = 


QUv  i 
pf+  ifc 


To  solve  equation  (2.1),  we  start  the  solution  process  for  level  i  =  0  and  apply  a 
GMRES  iteration  with  the  preconditioner  T\  to  the  Schur  complement  of  the  transformed 
To  =  A.  The  overall  method  is  presented  in  Algorithm  3.2. 

Algorithm  3.2  Solution  of  Ax  =  b  by  recursively  preconditioned  flexible  GMRES . 

(1)  Set  up 

(a)  Input  matrix  A,  vector  b,  tolerance  t. 

(b)  Decide  on  values  for: 

•  maximum  wavelet  level,  k, 

•  tolerance  ti  for  inner  iterations, 

•  block  band  width  for  approximating  the  submatrices . 

(c)  Set  To  =  A  and  i  =  0. 

(d)  Recursively,  for  i  —  1 . . .  k  +  1,  compute  Ti}  Ai}  B{,  Ci,  and  factorize  Ai. 

(e)  Factorize  Tk+ 1  into  Lk+\,  Uk+i- 

(2)  Solve  Tqx  —  b  by  flexible  GMRES  preconditioned  using  Algorithm,  3.1. 

Note  that  the  relatively  expensive  step  of  computing  the  BBNS-form  matrices  Ai,  Bi,Ci,Ti 
is  done  only  once. 
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4  Numerical  results 


Here  we  illustrate  the  effectiveness  of  our  method,  and  compare  it  with  some  alternative 
approaches,  by  considering  two  example  mn  x  mn  matrices: 


A 


ni+j,nk+l 


f  c  i  =  k  and  j  =  l , 

\  |  lo g((i  -  k )2  +  (j  -  Z)2)  otherwise, 


(4.1) 


for  i,  k  =  0, 1, . . .  ,  m  —  1;  j,  l  =  0, 1, . . .  ,  n  —  1;  c  a  constant. 

B»i+i,nfc+i=e-((i-fc)2+«-()2),  .  (4.2) 

for  fc  =  0, 1, . . .  ,ra  —  1;  j,l  =  0, 1, . . .  —  1. 

Tables  1  and  2  give  typical  results  for  the  matrices  A  and  B  respectively.  The  cost 
of  reducing  the  relative  residual  norm  to  a  tolerance  of  10“6  is  shown  for  matrices  of 
various  sizes  using  the  following  preconditioners: 

PI  simple  band  preconditioner, 

P2  standard  BBDWT  +  band  cut  preconditioner, 

P3  BBDWTPer  +  band  cut  preconditioner, 

P4  recursive  BBDWT-based  preconditioner. 

In  each  case  GMRES  was  restarted  after  10  iterations.  V  denotes  non-convergence  of 
GMRES(IO).  Unpreconditioned  GMRES(IO)  failed  to  converge  to  the  required  tolerance 
for  any  size  of  matrix,  so  it  is  omitted  from  the  tables. 


m 

n 

N  —  mn 

Preconditioned  GMRES 

Direct 

solution 

PI 

P2 

P3 

P4 

its. 

Mflops 

its. 

Mflops 

its. 

Mflops 

its. 

Mflops 

Mflops 

8 

8 

64 

30 

0.65 

49 

1.2 

38 

0.99 

6 

0.32 

0.21 

16 

16 

256 

58 

IT 

* 

* 

* 

* 

7 

5.5 

12 

32 

32 

1024 

86 

393 

* 

* 

* 

* 

7 

150 

720 

64 

64 

4096 

* 

* 

* 

* 

* 

* 

7 

6300 

46000 

Tab.  1.  Cost  of  solving  Ax  =  b. 


m 

n 

N  =  mn 

Preconditioned  GMRES 

Direct 

solution 

PI 

P2 

P3 

P4 

its. 

Mflops 

its. 

Mflops 

its. 

Mflops 

its. 

Mflops 

Mflops 

8 

8 

64 

8 

0.19 

8 

0.26 

8 

0.26 

4 

0.26 

0.21 

16 

16 

256 

62 

19 

66 

21 

63 

21 

6 

5.6 

12 

32 

32 

1024 

67 

310 

76 

380 

74 

370 

6 

120 

720 

64 

64 

4096 

69 

5000 

78 

6000 

78 

6000 

6 

1700 

46000 

Tab.  2.  Cost  of  solving  Bx  =  b. 

Clearly  the  recursive  BBDWT  approach  gives  better  performance  than  the  alternat- 
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ive  preconditioners  that  we  tested  and  offers  substantial  savings  compared  with  direct 
solution. 

5  Conclusion  and  future  work 

We  have  designed  a  preconditioning  method  that  exploits  smoothness  between  the  blocks 
of  a  class  of  dense  matrices  giving  useful  savings  compared  with  both  direct  solution  and 
preconditioned  GMRES  using  band  preconditioners.  In  the  future  we  plan  to  explore  a 
number  of  ways  of  further  improving  our  methods  including:  (a)  using  a  block  DWT, 
in  addition  to  the  BBDWT,  to  exploit  smoothness  within  each  block;  (b)  using  biortho- 
gonal  wavelets  or  multiwavelets  (particularly  the  new  supercompact  Haar  multiwavelets 
presented  in  [2])  to  give  improved  compression;  (c)  preprocessing  the  matrix  to  enhance 
smoothness. 
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Abstract 

One  of  the  authors  has  studied  the  properties  of  a  family  of  Riesz  bases  obtained  by 
perturbing  the  Haar  function  using  H-splines.  Although  these  bases  cannot  be  obtained 
by  multiresolution  analyses,  they  have  other  interesting  properties.  The  present  paper 
discusses  how  a  discrete  signal  {ar;0  <  r  <  N  —  1}  can  be  studied  by  considering 
a  suitable  function  of  the  form  f(t)  :=  arfr(t),  so  that  the  existing  theory  for 

functions  defined  over  a  continuous  domain  can  be  applied. 


1  Introduction 

In  what  follows  Z  will  denote  the  integers  and  3R  the  real  numbers;  t  and  x  will  always 
denote  real  variables.  The  support  of  a  function  /  will  be  denoted  by  supp(/),  its 
quadratic  norm  by  ||/||  and  if  /  E  L(]R)  its  Fourier  transform  is  defined  by 

f(x)  ■■=  [  e~txif(t)dt. 

v/jR  * 

In  [3]  we  found  a  family  of  affine  wavelet  Riesz  bases  of  L2( IR),  of  bounded  support  and 
arbitrary  degrees  of  smoothness,  obtained  by  smoothing  the  discontinuities  of  the  Haar 
function  using  B-splines.  Although  these  bases  are  not  orthogonal  they  are  symmetric, 
a  feature  that  is  lacking  in  orthogonal  wavelets.  Our  bases  can  be  constructed  so  that 
the  difference  between  the  frame  bounds  (which  are  given  explicitly)  can  be  made  as 
small  as  desired.  In  general,  orthogonal  wavelets  are  represented  by  infinite  series,  and 
for  computational  purposes  values  are  generated  over  a  discrete  set  using  the  cascade 
algorithm  [2,  5].  Our  bases,  on  the  other  hand,  are  given  in  closed  form.  We  now  briefly 
describe  how  these  wavelets  are  defined  and  introduce  additional  notation  and  make 
assumptions  that  will  be  used  in  the  subsequent  discussion. 

Let  Nm(t)  denote  the  B-spline  of  order  m  (m  >  2)  ([1],  Chapter  4),  X[o,m-i]M  the 
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characteristic  function  of  [0,ra  —  1]), 

m- 2  771—2 

9(t)  :=X[0,m-i]t<)  Nm{t-k),  gi(t)  :=g(t-m  + 1),  h(t)  :=  (1/2)  ^  Nm(t-k), 

k= 0  fc=0 

and  q(t)  :=  0i(f)  -  ft(t).  For  0  <  <5  <  1/2,  let  =  -a2  =  -0-3  =  a4  =  2 (m  -  l)/<5, 
Pi  =  2(m  -  1),  ft  =  2(m  -  1)(1  +  8)/6,  p3  =  =  (m  -  1)/<S, 

P{<}W  •=  (-1)<-I«(ee,t+A),i  =  1,2, 3, 4,  p{5>(t)  :=  -(x[x/2-J,i/2)(i)-X[i/2,i/2+i)(0), 

6 

p{6>(t)  :=X[0,i/2)(t)-X[i/2,i)W,  and  ip(t)  := 

i=l 

We  will  call  ip  the  perturbed  Haar  wavelet  In  [3]  we  proved  that  supp  (ip)  C  [-5, 1  +  8], 
ip  E  <7m_2(lR),  and  that  if  ipjtk(t)  :=  2j/2ip(2H  -  A;),  then  {^tfc;  j,k  E  Z}  is  a  Riesz 
basis,  and  we  provided  explicit  upper  and  lower  frame  bounds.  Moreover,  in  [7]  we 
showed  that  given  a  function  p,  the  wavelet  coefficients  {p,  ipjtk)  can  be  computed  in 
O(N)  steps  (where  N  is  the  sample  size),  just  as  in  the  orthogonal  case. 

In  this  paper  we  will  discuss  the  application  of  the  perturbed  Haar  wavelet  to  the 
study  of  discrete  signals.  Let  us  first  look  at  the  orthogonal  case  for  comparison. 

Let  p  be  an  orthogonal  wavelet  associated  with  a  multiresolution  analysis  {Vj\j  E  Z} 
and  a  scaling  function  <p,  with  the  caveat  that  the  definition  of  multiresolution  analysis 
that  we  are  adopting  is  that  of  [1]  and  [4],  and  therefore  Vj  C  Vj+ E  Z,  whether 
other  authors,  like  [2]  and  [5]  assume  that  Vj+i  C  Vj.  If  a  :=  {ar;0  <r_<N  —  1}  is  an 
arbitrary  sequence  of  real  or  complex  numbers,  then  this  discrete  signal  is  transformed 
into  a  continuous  one  by  considering  the  function  u(t)  ^r<P(t  -  r). 

The  study  of  the  signal  v(t)  has  two  stages:  the  analysis  stage  consists  in  computing 
the  wavelet  coefficients, whereas  the  synthesis  stage  consists  in  reconstructing  the  signal 
from  the  wavelet  coefficients.  If  Wj  denotes  the  closure  of  the  linear  span  of  the  functions 
Pj,k,  3  ^  Z;  then  the  Wj  are  mutually  orthogonal  and  V0  =  (BjcoWj.  Since  v  E  Vo,  it 
turns  out  that  the  wavelet  coefficients  (v,pj,k)  vanish  for  j  >  0.  Moreover,  since  u(t) 
has  compact  support,  for  each  j  <  0  there  is  only  a  finite  number  of  nonzero  wavelet 
coefficients. 

With  the  perturbed  Haar  wavelet  we  face  an  additional  problem:  the  spaces  Wj  are  no 
longer  orthogonal,  and  we  can  therefore  no  longer  assume  that  all  the  wavelet  coefficients 
corresponding  to  positive  values  of  j  must  vanish.  Moreover,  we  may  not  even  have  a 
scaling  function:  in  [8]  we  showed  that  if  S  =  2*,  where  t  is  a  negative  integer,  then  the 
perturbed  Haar  wavelet  ip  that  corresponds  to  this  value  of  £  cannot  be  generated  by  a 
multiresolution  analysis. 

To  overcome  these  difficulties,  we  proceed  as  follows.  Let  n  E  Z  be  such  that  2n  > 
4(m  —  1),  6^>(f)  :=  X[o,2(m-i))(<)?(4  *>{2}W  :=  g(4(m - 1)  - 1),  b(t)  :=  b^(t)  +  b^(t), 
fr(t)  :=  arb(2nt  —  4 (m  —  l)r),  and  f(t)  :=  frit).  By  a  direct  application  of  [3] 

Lemma  6  we  obtain  the  following 

Lemma  1  The  function  b(t)  has  the  following  properties: 

(a)  supp(b)  C  [0,4(m  -  1)],  (b )  be  Cm~2(M),  (c)  b(2(m  -  1))  -  1, 
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(d)  ^(0)  =  ^(2(m-  l))  =  ^&(4(m-l))=0,  1  <  k  <  m  -  2, 

(e)  The  total  variation  ofb  does  not  exceed  4 (m  —  1),  (f)  \b(t)\  <  1. 

From  the  preceding  lemma  we  conclude  that  supp(/)  C  [0,1],  and  that  the  func¬ 
tions  fr  have  disjoint  supports.  This  implies  that  ||/||2  =  ||6||2||a||22“n,  where  ||a||2  := 
J2r=o  K |2*  We  will  also  use  the  t\  norm:  ||a||i  :=  Ylr~o  lar I*  Note,  moreover,  that 
/  €  Cm-2(1R),  and  that  /(21~n(m-  l)(2r  + 1))  =  /r(21~n(m  -  l)(2r  + 1))  =  arb(2(m  - 
1))  =  ar. 

In  theory,  given  all  its  wavelet  coefficients,  the  function  /  can  be  reconstructed  using 
the  frame  algorithm  or  other,  even  faster,  algorithms  [5].  However,  since  there  may  be 
an  infinite  number  of  nonzero  wavelet  coefficients,  the  application  of  such  algorithms 
may  not  always  be  practical.  We  will  adopt  an  approximation  approach.  If  A  =  A(6,  m), 
and  B  =  B(8,  m)  are  respectively  the  lower  and  upper  frame  bounds  of  the  Riesz  basis 
generated  by  hj,k  '•=  if^j^k),  and  Lf  :=  ]Cj,fc,ez  hj,k^j,ki  then  from  the  error 
estimates  for  the  frame  algorithm  we  know  that  || L  f  -  /||  <  ((B  -  A)/(B  4-  j4))||/||. 
Since,  as  remarked  above,  we  can  make  A  and  B  as  close  to  1  as  we  want  by  making  8 
sufficiently  small,  we  conclude  that  for  every  e  >  0  there  is  a  8q  such  that  if  0  <  8  <  80, 
then  \\Lf  -  /||  <  e||/||.  To  approximate  /  using  the  wavelet  coefficients  it  will  therefore 
suffice  to  approximate  Lf  by  an  operator  of  the  form 

h 

g  I  _  hj^j 

j=ji  kez 

Observe  that  since  /  has  bounded  support,  E  f  reduces  to  a  finite  sum. 

Our  objective  will  be  accomplished  by  showing  that  there  is  a  constant  K  such  that 

<K\\a\\2~W2. 

kez 

But  first  we  need  to  prove  five  lemmas,  of  some  independent  interest.  We  begin  with 

Lemma  2  Let  {fl^fc  e  Z }  and  {bh‘,k  e  Z }  be  increasing  sequences  such  that  a*  < 
k  £  Z.  Assume  that  fk  6  L2(]R)  and  that  supp(fk)  Q  [flfc,&fc],  and  let 

:  f-Ekezfk-  Then  Ur<2Zkezy^- 

Proof:  If  r  <  k  -  1  then  br  <  &fc_2  <  ajt,  whereas  if  r  >  k  +  1  then  ar  >  ak+2  >  bk. 
This  implies  that  if  r^k-  1,  k  then  fT(t)  =  0  on  [afc,  bk],  and  we  readily  see  that 

ll/ll2  <2  £  r  |/fc(f)|2  =  2  J2  IIA-WII2-  ° 

k€ZJa*  kez 

Lemma  3  Let  u  £  L2(M)  be  a  function  with  support  in  an  interval  [a,  6]  with  b-a  <  1. 
If  j  <  0,  then 

^|<Ul^>|2<3H^. 

kez 

Proof:  Let  j  <  0  be  arbitrary  but  fixed,  and  define  I(k)  supp  Pi  [a,  6].  Then 
I(k)  C  [2 ~j(k  -  8),2~j(k  +  <5  +  1)]  0  [a,&].  If  I(k)  =  0  then,  either  2 ~i(k  + 8  +  1)  <  fl, 
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or  2 ~j(k  -  6)  >  b.  This  implies  that  if  I(k)  ^  0,  then  k  €  (2 ja  -  6  -  1 ,2jb  -j-  S).  Since 
the  length  of  this  interval  is  less  than  3,  we  conclude  that  there  are  at  most  three  values 
of  k  for  which  I(k)  /  0.  In  other  words,  there  are  at  most  three  values  of  k  for  which 
hj^k  7^  0.  Since  \^(t)\  <  1,  for  any  such  k  we  have: 

2 

\{u,i>jtk)\2 _  =  2J  f  U(t)ip{2kt  -  k)  dt<V  f  \u(t)\2dtf  \if,{2kt  -  k)\2  dt 
J  I(k)  Jl(k)  Jl(k) 

<  (6-a)2J  /  |n(t)|2 dt  <  2j ||n||2.  □ 

Lemma  4  Let  a, {3, 7,  <x  G  iR,  with  0,7  7^  0,  and  define  c(t)  :=  g(ad  +  /?),  d(£)  := 
<7(7^  -f  <j);  and 

K  =  2  {  [25/64  +  (25/192 j2/3]  (m  -  l)4  +  (m  -  1)2/1024| . 

If  j  >  0  and  i  =  5,6,  then 

(a)  £ \(d,Cj,k)\2  <  2  UVKa~2  +  1/3)Y>;  (b)  £  \{d,p%)  “  <  (2V2  +  1/2)*  2"*. 

fcez  kez  ’  V  y 

Proof:  (a)  From  [3]  p.  3367  (bearing  in  mind  the  slightly  different  definition  of  the 
Fourier  transform),  we  have 

g(x)  =  (i/x)e-^-1^2  [e-(—  W  _  ((2/®)sin*/2)m‘1"  . 

From  [1]  p.  56  (3.2.16), 

Nm(x)  =  e“^1/2^ma;*[(2/a;)sina:/2]m.  •  (1.1) 

Let 

s(®)  :=  [(2/:r)sina;/2]m'"1. 

Then 

gi{x)  =e-(m~1'>xig{x)  =  {i/x)\e-^m-1)xi  -  e~^2^m-^xis{x)\. 

Since 

. .  %«) 

fc=0 

a  straightforward  computation  yields 

h(x)  =  —  i/(2x)[e~^1^m~1^xl  —  e”^2^m_1^i]s(x), 


whence 


q{pc)  =  z-e  1^*[cos(m  —  l)x  —  s(a:)  cos  i(m  —  1): 


This  implies  that 


+z(2s(:r)sin -(ra  -  l)x  -  sin(m  -  1)&)].  (1.2) 


|?(^)|2  <  8a:  2,  a:  0. 
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On  the  other  hand, 

q(x)  =  ix~1e-(m~1)xi[(vi  +  v2)  +  i{v2>  +  V4)\, 

where 

v\  :=  cos(m  -  l)x  -  cos(l/2)(m  -  l)a,  v2  :=  [1  -  $(#)]  cos(l/2)(m  -  l)x, 

Vs  :=  s(x)  [2sin(l/2)(m  -  l)x  -  sin(m  -  l)x) ,  V4  :=  [s(jc)  -  1]  sin(m  -  l)ir. 

A  McLaurin  expansion  shows  that  |vi|  <  (5/8)(m  —  l)2x2.  Since  1  -  um~l  =  (1  — 
u)  X!/£=o2  uk  anc^  I  s*n  u\  ^  M»  we  in^er  th** 

|1  -  s(x) |  <  (m  -  1)|1  ~  (2/x)  sina:/2|  =  (m  -  1)(2/.t)|x/2  -  sina/2|. 

Since  |u  -  sinu|  <  |w|3/6,  we  conclude  that  |1  -  s(ar)|  <  (m  -  l)x2/48.  Thus, 

1^2 (a;) |  <  (m  -  l)rr2/48,  and  |v4(ic)|  <  (m  -  l)rr2/48. 

Another  McLaurin  expansion  yields  |v3|  <  (5/24)(m  -  l)3|rc|3.  Clearly  ^3 1  <  3;  thus 
\v$\  =  |t'3|2/3|u3|1/3  <  (25/192)1/3(m  -  l)2:r2.  Since 

\q{x)\2  =  x"2[(ui  +  u2)2  +  (v3  +  v4)2]  <  2x~2[v\  +  +  v|  +  1/4], 


we  deduce  that 

Prom  Plancherel’s  identity  we  have: 
(d, 


|g(z)|2  <  iCr: 


(1.4) 


2j/2  [  d(t)c(2n-k)  dt  =  2j/2/(2n)  f  ekxic(x)d(2jx)  dx 

J  H  */B 

/•2-rr  _  ^ 

=  2j/,2/(2tt)  /  eto  y  c(a  +  2Trr)d(2j(x  +  27rr))  dx. 

•'°  fc€  z 

This  means  that  {2_j/2<d:,  Cj,fc);  A:  €  Z}  is  the  sequence  of  Fourier  coefficients  of  the 
function  ^2keZc(x  +  2nr)d(2j(x  +  2-itr)).  Thus,  applying  Bessel’s  identity  and  then  the 
Cauchy-Schwarz  inequality  twice  (once  for  sums  and  once  for  integrals),  we  have: 

,2*  ^  2 

27r2~-’^|(d,Cj,fc)|2  =  /  ^  c(x  +  2nr)d(2:l(x  +  2nr))  dx 

kez  rez 

r2n  ^  ^  ^  .  II2 

<  /  \\c(x)d(2jx)\  +  \c{x-2n)d{V(x-2n))\  +  \  ]T  c{x  +  27rr)d(V(x  +  2irr))\  j  dx 

Jo  r?S0,-l 

<-lU 


\i/z  f  r2?r  ^  y 

\c{x)d{2jx)\2  dx)  +  IJ  |c(x  -  2'ir)d(2:’(x  -  2k))  |2  dxj 

+  (  f  |  c(x+2nr)d(2^ (x+2Ttr))\2  dx') 


1/2  i  2 


=:  [\/Si  +  y/^2  +  • 


Some  properties  of  the  perturbed  Haar  wavelets 

Since  c(x)  =  a~1e^^a>)xtq(a~1x)1  (1.3)  implies  that 

\c(x  +  27rr)|2  <  8|x  +  27rr|~2,  x^2i it,  (1.5) 

whereas  from  (1.4)  we  see  that 

\c(x  4-  27rr)|2  <  K a~4\x  +  2nr\2.  (1.6) 

Since  d(x)  =  j^1e^r^^q(j~1x)t  (1.3)  also  implies  that 

|d(2J'(x  +  27rr))|2  <  4“*7+1  2|x  +  27rr|“2,  x  ^  2n r.  (1.7) 

Since  S\  is  obtained  by  integrating  the  product  of  the  left-side  members  of  (1.6)  and 
(1.7)  (with  r  =  0)  over  an  interval  of  length  27T,  we  readily  see  that 

Si  <  16ttK  a~44~:i .  (1.8) 

A  similar  argument  yields 


S2  <  16ttK  a~4 j . 


From  Minkowski’s  inequality 


_  , 2 

S3  <  ^  \c(x  +  27rr)\2  ^  d(2J(z  +  27rr))  dx . 

r^O -1  r^0,-l 

If  £  €  [0, 2ix\  and  r  >  1  then  from  (1.5)  we  have: 

y:  | c(x  +  27rr)|2  <  2tt~~2  ^V-2  ~ 

r>l  r>l 

whereas  (1.7)  implies  that 

y  |d(2J  (a?  +  27rr))  <  2  4“J7r“~2  ^  r~ 2  =  4“J /3. 

r>l  r>l 

Similarly, 

y  |c(x  +  27rr)|2  <  27r“2^r“2  =  1/3, 


y;  |d(2J(a;  +  27rr))  <24  jtt  2  2  —  4  J/3, 

r<-2  r>l 

whence  we  conclude  that  £3  <  (47r/9)4“’7,  Combining  (1.8),  (1.9)  and  the  preceding 
inequality,  the  assertion  follows. 

(b)  Note  that  is  with  6  =  1/2.  Since  p^(x)  —  2ix~le~^l^2^xi{l  —  cos&r), 
we  see  that 

|p45> (a:  +  27rr)|2  <  4\x  +  27rr|-2,  x  ^2nr.  (1.10) 

On  the  other  hand,  the  inequality  1 1  —  cos <  (1/2)<52x2  implies  that  |p{5>(a;)|  <  82\x\\ 
therefore 

|pl5l(rr  +  27rr)|2  <  54\x  +  27rr|2.  (1.11) 
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We  now  repeat  the  argument  employed  in  (a),  using  (1.10)  instead  of  (1.5),  (1.11)  instead 
of  (1.6),  and  bearing  in  mind  that  5  <  1/2.  □ 

We  now  find  bounds  for  the  quadratic  norms  of  q(t)  and  b(t). 

Lemma  5  (a)  ||^||  <  1/  (b)  ||6||  <  2(m  -  1). 

Proof:  (a)  [1]  Theorem  4.3  implies  that  the  functions  Nm  are  nonnegative.  This  implies 
that  both  g  and  h  are  nonnegative.  In  the  proof  of  [3]  Lemma  6(f)  we  show  that 


whence 


L  9{t)dt  =  f  h(t )  dt  =  (m  —  l)/2, 

J]R  J  lEv 

f  lg(i)|  dt  <m-  1. 

Jn 


Moreover,  |tj(t)|  <  1  ([3]  Lemma  6(h)).  Thus, 

f  \q(t)\2  dt  <  f  \q(t)\ dt  <  m  -  1. 

J  H  J] R 

Therefore, 

[  \pW(t)\2dt  =  (6/2{m-l))  [  \q(t)\2dt<6/2,  i  =  1,2, 3, 4. 

JiR  m 

This  implies  that 

/  m)\2dt<48/2+  f  \pW(t)-pW{t)\2dt  =  25  +  (l-25)  =  1. 

(b) 

f  \b(t)\2  dt  <  f  \b{t)\dt  =  2  j  |g(t)|  dt.  <  2(m  —  1).  □ 

J] R  R  «/]R 


Therefore, 


This  implies  that 


Theorem  1 

(a)  If  j  <  0, 


<  2 VS(m  -  l)||a||  2^2 


(b)  Let  K  be  defined  as  in  Lemma  4 •  If  j  >  0? 


<  8  [V5  ( K2l~n  +  1/3)  +  V2+  1/3]  Halb  2~^2. 

JfcGZ 

Proof:  Assume  first  that  j  <  0.  Applying  Lemma  2,  Lemma  3,  and  Lemma  5,  we  have: 
2 

BAMik*  ^ 2 E IIMaOiM2  =  2M2 E K/'^,0l2 

ke z  fcez  fc€Z 

<  2  V  |(/,^,fc>|2  <  6||/||22>  <  6||6||2||a||22-’_n  <  24(m  -  l)2||af  2’~". 


Some  properties  of  the  perturbed  Haar  wavelets 

Assume  now  that  j  >  0.  Setting  br^  (t)  :=  arb^  (2 nt  —  4 (m  —  l)r),  we  see  that  fr(t)  = 
b{r1](t)  +  b<2}(t).  Thus, 

<  EE  E  Y,(br}’P^j,k  ■ 

k€Z>  i~l  t=l  r=0  fceZ 

Applying  Lemma  2  and  Lemma  5  as  above,  we  see  that 


he  Z  fc€Z 

Since  the  Fourier  transforms  of  q(t)  and  X[o,2(m-i))<l(t)  are  identical,  and  the  functions 

br^  are  of  the  form  ar  q(at  +  j3)  or  ar  X[o,2(m-i))(at  +  0)q(dt  +  (3)  with  \a\  —  2n,  from 
Lemma  4  we  have: 


E  |$i}>P$>f  ^  2I«-I2  (2^2_n  +  1/3) 


2-J,  e  =  1,2, 3, 4, 


E  I ^  Kl'2  (v^+  l/3)22^',  t  =  5,6, 
fcez 

whence  the  assertion  readily  follows.  □ 
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Abstract 

In  this  paper  we  consider  B-wavelets  of  order  2,  i.e.  piecewise  linear  spline  prewavelets  of 
smallest  support,  over  nonuniform  knot  sequences.  We  discuss  an  example  showing  that 
for  1  <  p  <  oo,  there  is  no  absolute  Instability  for  these  B-wavelets.  This  means  that 
regardless  what  specific  scaling  of  the  B-wavelets  is  chosen,  the  corresponding  stability 
constants  cannot  be  made  independent  of  the  knot  sequences  involved. 


1  Introduction 


Polynomial  splines  are  fundamental  tools  in  numerous  branches  of  applied  mathematics, 
and  for  spline  spaces  defined  over  a  given  knot  sequence,  the  basis  of  choice  is  provided  by 
B-splines,  which  possess  a  lot  of  attractive  properties  for  numerical  computations.  One 
of  these  important  properties  of  B-splines  is  their  absolute  stability.  Given  a  B-spline 
basis  {Bi}ieI  of  polynomial  order  d  over  a  valid  knot  sequence  t,  a  classical  result  by  de 
Boor  [1]  states  that  properly  normalized  B-splines  are  stable  in  the  sense  that  for  each 
set  {bi}ie2 :  of  real  coefficients  it  holds  that 


CJ1 


INI.  < 


E6« 

%  *  Bi 

iex 

<  INI, 


(i.i) 


Here  ||*||  denotes  the  standard  integral  and  discrete  p-norms  for  1  <  p  <  oo,  respectively, 
and  the  normalizing  factor  Si  for  each  B-spline  is  the  length  of  its  support  divided  by 
the  order  d.  The  important  point  is  that  the  positive  constant  C<*  is  dependent  on  the 
order  d  alone,  and  not  in  any  way  on  the  underlying  knot  sequence  t. 

Since  nested  knot  sequences  give  rise  to  nested  spline  spaces,  spline  functions  have  also 
become  a  focus  of  attention  within  the  theory  of  wavelets  and  multiresolution  analysis, 
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starting  with  cardinal  spline  wavelets  on  infinite  equally  spaced  and  uniformly  refined 
knot  sequences,  for  which  Fourier  transform  techniques  are  available,  see  [3]  and  the 
references  therein. 

The  study  of  spline  wavelets  on  bounded  intervals,  for  arbitrary  knot  sequences  and 
nonuniform  refinement  began  with  the  papers  [4],  [5]  and  [2],  respectively.  The  construc¬ 
tion  of  so-called  minimally  supported  B- wavelets  for  a  given  spline  order  d  and  two  nested 
knot  sequences  to  provide  a  basis  of  the  relative  orthogonal  complement  (wavelet)  space 
is  described  in  detail  in  [6].  This  means  that  given  the  coarse  and  fine  knot  sequence, 
there  exist  explicit  algorithms  to  determine  the  supports  of  the  B-wavelet  functions,  the 
so-called  minimal  intervals,  and  also  to  compute  the  corresponding  wavelet  functions, 
though  only  up  to  a  normalization  constant. 

One  open  problem,  however,  is  how  to  fix  the  normalization  factor  for  each  B-wavelet 
function  to  achieve  best  possible  stability  for  the  whole  B-wavelet  basis.  We  provide  an 
example  for  the  case  of  piecewise  linear  wavelets,  i.e.  polynomial  order  2,  that  shows 
that  for  1  <  p  <  oo  there  is  no  absolute  stability  of  B- wavelets,  meaning  that  there  is  no 
choice  of  normalization  that  provides  absolute  stability  constants  which  are  completely 
independent  of  the  underlying  knot  sequences.  Testability  estimates  involving  a  quantity 
dependent  on  the  knot  sequences  for  1  <  p  <  oo  and  showing  absolute  stability  for  p  =  1 
are  given  in  [7]. 

2  Piecewise  linear  B-wavelets 

The  theory  of  B-wavelets  [6]  covers  general  cases  of  knot  refinement,  such  as  situations 
where  several  or  no  knots  at  all  are  inserted  into  an  old  knot  interval,  or  where  the 
multiplicity  of  an  existing  knot  is  increased.  For  our  purposes,  however,  it  is  sufficient  to 
consider  what  one  might  call  the  standard  setting,  where  all  knots  are  simple  except  at 
the  interval  endpoints,  which  we  can  count  as  double  knots,  and  where  exactly  one  new 
knot  is  inserted  strictly  between  two  old  ones. 

Our  notations  are  as  follows  for  the  closed  interval  [0,1].  We  have  a  coarse  knot 
sequence  with  n  —  1  interior  knots,  namely 

r  :  0  =  Tq  <  ri  <  ■  •  •  <  rn  =  1. 

Strictly  between  each  pair  of  coarse  knots  7i_i  and  we  insert  a  new  knot  $i  at  an 
arbitrary  location,  i.e. 

n- 1  <  si  <Ti  for  each  i  =  1, . . . ,  n. 

Thus  we  have  a  sequence  s  of  new  knots 

s  :  0  <  Si  <  •  *  •  <  sn  <  1. 

The  fine  knot  sequence  t  =  rU  s,  when  ordered  appropriately,  is  given  as 

t :  0  =  t0  <  *i  <  *  *  *  <  hn  =  1, 

where  the  even  numbered  knots  in  t  correspond  to  old  knots  in  r,  while  the  odd  numbered 
knots  represent  the  newly  inserted  knots  from  s.  To  account  for  the  boundary,  we  treat 
the  interval  endpoints  as  double  knots  by  setting  r_i  ~t-\  =  0  and  rn+i  =  t2n+i  =  1. 
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For  our  investigations  it  is  necessary  to  introduce  also  some  notation  related  to  the 
knot  spacings.  We  set 

d{  =  ti+ 1  —  U  for  i  —  0, . . . ,  2n  -  1 ,  and  <5*  =  U+i  —  i  for  i  =  0, . . . ,  2n, 

which  means  8o  =  do  =  t\  —  to  and  8211  —  c^n-i  =  t2n  —  ^2n-i  at  the  boundary.  Thus  5;  is 
the  distance  between  two  consecutive  old  knots  if  i  is  odd,  and  between  two  consecutive 
new  knots  if  i  is  even  (and  not  at  the  boundary). 

We  also  introduce  the  index  sets 

fi  =  {l,3,...,2n-  1}  and  fi0  =  {3, 5, . . . , 2n  —  3}  . 

The  piecewise  linear  functions  on  the  knot  sequences  r  Ct  form  nested  linear  spaces 
Vo  C  V\  of  dimensions  n-f- 1  and  2n+ 1,  respectively.  The  corresponding  piecewise  linear 
B-splines  forming  a  basis  of  these  spaces  are  simple  hat  functions.  We  denote  them  as  (pj 
and  7 i  for  r  and  t)  respectively,  where  with  the  necessary  adjustments  at  the  endpoints, 


(x  -  Tj-l)  /S2j-1  if  X  e  \Tj-iTj] 

<Pj  (*)  =  {  ( Tj+l  -  x)  /52j+ 1  if  X  e  [Tj,Tj+ 1] 
0  otherwise 

(iK  —  t{— 1)  / d{— 1  if  x  € 

7 i  (x)  =  {  (ti+1  -  as)  /di  if  X  e  [fj,<t+il 

0  otherwise 


for  j  =  0, . . .  ,n, 


for  i  =  0, . . . ,  2 n. 


(2.1) 

(2.2) 


Using  for  any  two  functions  /,  g  G  V\  the  standard  inner  product 

(f>9)  =  /  f(t)g(t)dt, 

Jo 


we  can  write 


y1==:y0©W, 


where  W  is  the  relative  orthogonal  complement  of  Vo  in  V\ ,  and  ©  denotes  orthogonal 
summation.  The  dimension  of  W  is  n,  so  that  there  is  a  basis  function  tpk  for  every  index 
k  €  fi,  or  in  other  words  for  each  newly  inserted  knot  Sjt- 

Nonzero  functions  eW  with  minimal  support  are  called  B- wavelets.  The  general 
theory  for  B-wavelets  developed  in  [6]  establishes  in  this  special  case  that  there  are  n 
different  piecewise  linear  B-wavelets  which  form  a  basis  of  the  wavelet  space  W.  Each 
such  B- wavelet  is  uniquely  determined  up  to  a  constant  multiple.  There  are  two  boundary 
B-wavelets  ^1  and  ip2n- 1  and  n  -  2  interior  B-wavelets  ipk  for  k  €  f2o?  which  we  will 
consider  first.  Each  interior  B- wavelet  has  support  [^-3,^+3],  so  that 

fc+2 


i>. k  (x)  =  qfji  (x)  for  x  €  [0, 1] 


i=k-2 


with  the  coefficients  determined  by  tpk  €  W,  or  in  other  words 

■  (I’k.'Pj)  =  0  for  j  =  0, 


An  Lp-stability  example  for  piecewise  linear  B-  Wavelets 

For  the  boundary  wavelets  fa  and  fan^  we  have  to  make  some  minor  modifications. 
Their  supports  are  [to, £4]  and  [t2n-4,t2n]5  respectively,  so  that 

3  2  n 

fa  (*)  =  J2  vh*  (x)  and  fan-1  (x)  =  J2  •g*n-i7<  (*)  for  x  e  [0, 1] . 

*=0  i=2n-3 

In  the  paper  [7]  the  values  of  all  B- wavelet  coefficients  qf  are  given  explicitly  in  terms 
of  the  knot  locations  for  the  standard  setting  described  here.  In  the  same  paper  estimates 
for  the  coefficients  are  used  to  derive ’Lp-stability  estimates  for  these  B-wavelets. 

3  Stability  of  B-wavelets 

Our  aim  in  this  paper  is  to  establish 

Theorem  3.1  Given  the  B-wavelet  basis  {4>k}k^ ,  then  forl<p<oo,  there  are  no 
sets  of  weights  aktP>  k  e  SI,  such  that 

^  ]  CkOlktplpk 

ken 

holds  for  any  wavelet  coefficients  (cj,  C3, . . . ,  C2„_i)  and  with  absolute  constants  K\>  0 
and  K2  >  0,  which  are  completely  independent  from  the  choice  of  knot  sequences  r  and  s. 

Due  to  the  finite  dimension  of  W,  it  is  clear  that  stability  constants  K1  and  K2  exist, 
as  any  two  norms  on  W  are  equivalent.  The  pertinent  question  is  how  the  weights  could 
be  chosen  to  achieve  that  the  constants  are  actually  independent  of  the  dimension,  the  p- 
norm  and,  if  possible,  the  choice  of  new  knots  s.  We  will  prove  the  assertion  by  assuming 
that  the  estimate  (3.1)  holds  with  constants  independent  of  the  knot  sequences.  Then 
the  following  special  case  serves  as  a  counterexample  to  this  assertion. 

The  old  knot  sequence  r  consists  of  the  equally  spaced  points: 

r0  =  0,n  =  1/3,  t2  =  2/3  ,r3  =  1. 

We  want  to  investigate  what  happens  if  two  newly  inserted  points  are  positioned  ever 
more  closely,  so  we  introduce  the  new  knots  as 

si  =  1/3  -e,s2  =  1/3  +  77,  s3  =  5/6,  for  0  <  e,  r]  <  1/3, 

in  order  to  find  out  what  happens  if  both  e  0+  and  r\  0+. 

Thus  the  fine  knot  sequence  t  is 

<0  =  0,  ti  =  1/3  —  e,  t2  =  1/3,  t3  =  1/3  +  T),  u  =  2/3,  t5  =  5/6,  t6  =  1. 

The  fine  interval  lengths  are 

do  =  1/3  -e,di=  e,d2  =  g,d3  =  1/3  -  r/,  d4  =  1/6,  d5  =  1/6, 


h  =  <53  =  Ss  =  1/3,  and  50  =  1/3  -  e,  <52  =  e  +  tj,  S4  =  1/2  -tj,56  =  1  /6. 


<  K2  ||c||p  -  (3.1) 


while 
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In  this  setting  any  wavelet 


e 


must  be  orthogonal  to  the  coarse  hat  functions  (p0,  ■  ■  ■ ,  ¥>3-  This  actually  means  that  the 
column  vector  q  of  coefficients  qi  must  satisfy  the  matrix  equation 

Aq  =  0,  (3-2) 

where  the  entries  of  A  are  the  inner  products  of  the  coarse  and  fine  hat  functions,  i.e. 

Oj,i  =  (<pj,7i),  for  j  =  0, . . . ,  3,  j  =  0, . . .  ,6. 

Direct  computations  using  (2.1)  and  (2.2)  yield  as  the  only  nonzero  entries 

1  ,  1  1  1  .  1  „  _  1  2 
00,0  =  ~2e  _6e+9’  a°’1  =  6e+18’  °’2~2  ’ 

1  2  1  ,  1  _  1  ,  I 

01,0  =  o£  -Qe+18’  ai’1_  6  +9’ 


1  2 

1  1 

oo,o  = 

-2£ 

"6£+9’ 

00,1  =  ; 

1  2 

1  1 

Ul,0  = 

2£  - 

3£+  18’ 

oi.i  =  - 

1  2 

1  1  2 

1 

a12  = 

-2£ 

+  2£-2V 

+  2V' 

1 

1 

1  2 

Ol,3  = 

~gT?+-,  0M  = 

mV*  - 

1  2 

1 

1 

02,2  = 

02,3  =  qV 

+  18’ 

1  2 

1  13 

a2)4  = 

V 

-6V+72 

) 

1 

1 

02,5  = 

12’ 

02,6  =  72- 

1 

1 

o  3,4  = 

72’ 

03,5  -  jj, 

03,6  = 

We  now  investigate  the  B-wavelets  Wi  and  i/;3  in  detail,  corresponding  to  Si  —  <i  and 
s2  =  t3.  Specializing  the  results  from  [7]  then  yields  all  necessary  B-wavelet  coefficients 
for  this  setting  up  to  a  scaling  factor.  Note,  however,  that  it  is  straightforward  to  check 
that  the  corresponding  coefficient  vectors  satisfy  the  matrix  equation  (3.2). 

The  coefficients  of  the  boundary  wavelet  ipi  are 


3 

_  1  —  3e  ’ 

■  9  eV 

£  +  il  +  6e7j’ 

1+37? 

e  +  7?  +  6erj  ’ 

9r]2 

e  +  r)  -f  6er/'* 
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while  the  ones  for  the  interior  B-wavelet  7/3  are 

9s2 


9?  = 


92 


93  =  3  + 


e  +  77  +  6erj ’ 
l  +  3e 


e  +  *0  +  6^77  ’ 
3(e  +  ??) 


94 


94  = 


_  9(1  ~  277) 

2(e  +  77  +  6e?7)  2(5  —  12ry)  ’ 

9 


5-127?’ 

3 


2  (5  -  1277)  * 

We  first  provide  estimates  for  the  p-norms  of  these  B- wavelets. 
Proposition  3.2  For  small  enough  e  and  77,  it  holds  for  1  <  p  <  00  that 

,1/p-i 


i/p-i 


•o 

IV 

16/1 
45  Is 

L\1/p 

j)  (*  +  »?) 

\mP  > 

»(■ 
45  V  - 

L\1/p 

H  (e  +  ^) 

Proof:  For  all  0  <  e,  77  <  1/3  we  find  that 


>  (e  +  V) 


-1 


(1  +  3 rj)  (g  +  ??) 

0<e'r?"<l/3  £  +  77  +  6£77 


inf 


and,  similarly, 


=  ffc+ir1 


8 


Note  that  instead  of  8/9  we  may  write  1  —  a  for  any  a  >  0  if  e  and  77  are  small  enough 
or  even  1  if  e  =  77. 

In  the  process  e,  77  — ►  0+  all  other  coefficients  qj  and  qf  have  finite  limits.  This  means 
that  for  small  enough  s  4-  77 


IMIoo 

iitfeiL 


max |^|  =  \q\\  >-(e  +  ??) 


-1 


8 


max |gf  |  =  |g§|  >  x  (e  +  r?)  1  ■ 


The  absolute  stability  of  piecewise  linear  B-splines  (1.1)  yields  with  C2  >  5/2  (see  [1]) 
and  82  ='  e  +  77 


ll^xllp  = 


i=0 


7* 


Q)  II  (9°<5o/P-  9i1<5i1/p,  9^2/P.  9^a/P)  ||p 
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> 


Analogously  we  get 


l/p-i 


mp  =  ||£#* 

i— 1 

to  complete  the  proof. 


*iGfwi^ilGffr+o”" 


Proof  of  Theorem  3.1:  Let  us  now  assume  that  with  some  scaling  factor  B- wavelets 
are  absolutely  stable  in  p-norm  for  1  <  p  <  oo,  i.e.  there  exist  weights  oa-iP  so  that 
the  inequalities  (3.1)  hold  with  constants  independent  of  the  specific  choice  of  knot 
sequences.  Choosing  in  the  current  setting  all  coefficients  equal  to  zero  except  for  c\  =  1, 
the  stability  inequality  (3.1)  yields 

||«1, P^l||p  <  #2 

or  in  other  words,  using  Proposition  3.2 


and  by  a  similar  argument 


(3.3) 


(3.4) 


On  the  other  hand,  the  stability  estimate  (3.1)  yields  for  arbitrary  c\  and  C3,  while 
setting  all  other  Ck  to  zero,  that 

Ikiaj^i  +  c3a3tPil)3\\p  >  K\  (|ci|p  +  |c3|p)1/p  >  tfimaxddl ,  |c3|) . 

Let  us  choose  specifically 

Ci  ®3,p  ^r^d  C3  —  CVi5p, 

which  results  in 

|ai,p«3,p||IV’i  -foWp  >  ^imax(|aljP|,|a3,p|), 
leading  with  (3.3)  and  (3.4)  to 


(£  +  r,)1-1/p||V;i-V'3llp>  Q) 


1/p 


16Kl 

45K2 


(3.5) 


On  the  other  hand  we  derive  from  the  absolute  stability  of  linear  B-splines  (1.1)  that 

Ui-Mp 

=  ||?o7o  +  (?i  -  Qi)  71  +  (?2  -  4)  72  +  (g3  -  g3)  73  -  qh*  -  9575  ||p 

(go<5o/p>  (4  -  9?)  4/p>  (92  -  <&)  4,p>  (4  -  4)  4/p  ^l5\lp  ^l5l/p) 


An  Lp-stability  example  for  piecewise  linear  B- Wavelets 
<  6max  (|^o|  |«i  —  «i|  <^i/p>  1^1  —  «l|  ^2/P> 

\M/P,W/P). 

All  the  terms 

|«o I  s0/p>  ki  -  «i I  sl/p,  |?3  -  93|  sl/p,  |?||  5\/v,  |g||  <5g/p 

are  in  fact  bounded  from  above  for  £ +  77 —>  0+,  so  that  the  expression 

(e  +  *?)1-1/PkoK/p 

and  the  other  such  terms  tend  to  zero. 

Since  \q\  —  q%\  =  3  \e  —  rj\  /  (e  +  r)  -f  6^77)  we  obtain  for  the  only  remaining  term 

(£  +  T)]1~1/P  ^  ~  =  1  +  M/~("'+  r,)-^- » 

which  goes  to  zero  as  well  for  e  +  77  — ►  0+.  As  a  consequence 

Hm  (e  +  7?)1_1/p  ||^i  -  ^3  lip  =  0, 

£+7?— »0+ 

which  contradicts  (3.5).  □ 

Remark  3.3  Although  we  have  chosen  an  example  with  one  boundary  and  one  interior 
B-wavelet,  let  us  remark  that  the  lack  of  absolute  stability  is  in  no  way  due  to  a  boundary 
effect.  A  completely  analogous  reasoning  is  possible  if  one  chooses  knot  sequences  with 
more  interior  knots  and  studies  the  behaviour  for  two  interior  B-wavelets  once  two  new 
knots  coalesce.  Similarly  just  two  boundary  B-wavelets  could  be  used  on  an  even  shorter 
knot  sequence,  where  there  are  no  interior  B-wavelets  at  all. 
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Abstract 

In  this  paper  we  consider  the  support  properties  of  locally  linearly  independent  refinable 
function  vectors  <f>  =  (<j>\, . . . ,  (j>r)T ■  We  propose  an  algorithm  for  computing  the  global 
support  of  the  components  of  Further,  for  $  =  (0i,  <p2)T  we  investigate  the  supports, 
especially  the  possibility  of  holes  of  refinable  function  vectors  if  local  linear  independence 
is  assumed.  Finally,  we  give  some  necessary  conditions  for  local  linear  independence  in 
terms  of  rank  conditions  for  special  matrices  given  by  the  refinement  mask.  But  we  are 
not  able  to  give  a  final  answer  to  the  question  whether  a  locally  linearly  independent 
function  vector  can  have  more  than  one  hole. 


1  Introduction 


Let  $  =  ,  <t>r)T ,  r  e  3N,  be  a  vector  of  compactly  supported  continuous  functions 

on  IR.  The  function  vector  $  is  said  to  be  refinable  if  it  satisfies  a  vector  refinement 
equation 

$(ar)  =  53  A(Jfc)$(2a:-fc),  *  6  JR,  (1.1) 

fcezz 

where  {A (k)}  is  a  finitely  supported  sequence  of  real  (r  x  r) -matrices. 

Refinable  function  vectors  play  a  basic  role  in  the  theory  of  multi  wavelets.  In  the  last 
years  the  properties  of  refinable  function  vectors  have  been  investigated  very  extensively. 
In  fact,  it  is  possible  to  characterize  properties  like  approximation  order  and  regularity 
of  $  and  Instability  of  the  basis  generated  by  $  completely  by  means  of  the  refinement 
mask  {A(k)}  [1,  6,  7,  11]. 

We  say  that  $  is  L2 -stable  if  there  are  constants  0  <  A  <  B  <  oo  such  that  for  any 
sequences  c\ , . . . ,  cr  G  l2  (ZZ) , 


^E£Mfc)l2<  <b££mioi2. 


u= i  fcezz 


u-l  keTL 


In  some  applications  one  needs  not  only  instability  of  the  basis  generated  by  $  but  other 
stronger  conditions  of  linear  independence.  We  say  that  $  is  globally  linearly  independent 
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Locally  linearly  independent  function  vectors 
if  for  any  sequences  c\ , . . . ,  Cr  on  7L 

r 

E£c  u(k)  <j)u( •  -  k)  =  0  on  IR 

v=\  ke7L 

implies  that  cv{k)  =  0  for  all  v  =  1, . . . ,  r  and  all  k  G  7L  (see  [8,  5]). 

The  following  definition  is  even  more  restrictive:  A  function  vector  $  is  called  to  be 
linearly  independent  on  a  nonempty  open  subset  G  of  IR  if  for  any  sequences  c\ , . . . ,  cr 
on  TL 

r 

Y2  X)  c"(fc)  'M'  -*0  =  0  on  G 

i/=l  k€_7L 

implies  that  cu(k)  =  0  for  all  fc  G  IU(G ),  i/  =T, . . . ,  r,  where  /^(G)  contains  all  k  G  Z 
with  </)„(-  -  k)  =£  0  on  G.  Finally,  is  called  to  be  locally  linearly  independent  if  it  is 
linearly  independent  on  any  nonempty  open  subset  G  of  IR. 

Obviously,  local,  linear  independence  of  $  implies  global  linear  independence  and 
global  linear  independence  of  implies  Instability.  It  has  been  shown  by  Sun  [12],  that 
for  compactly  supported,  refinable  functions  (r  =  1)  with  dilation  factor  2  the  notions 
of  local  and  global  linear  independence  are  equivalent.  However,  this  is  not  longer  true 
for  function  vectors  [4]. 

For  (scalar)  refinable  functions  0,  local  linear  independence  implies  that  0  has  integer 
support,  i.e.,  supp0  starts  and  ends  with  an  integer,  and  supp0  does  not  contain  holes, 
i.e.,  supp0  is  an  interval. 

Now,  one  can  ask,  ‘is  this  also  true  for  locally  linearly  independent  refinable  function 
vectors?’  Unfortunately  this  is  not  the  case.  In  [10]  it  has  been  shown  that  a  component 
of  $  can  have  a  hole.  However,  it  is  not  clear,  whether  a  refinable,  locally  linearly  inde¬ 
pendent  function  vector  can  also  have  components  with  finitely  many  or  even  infinitely 
many  holes. 

In  this  paper,  we  want  to  investigate  support  properties  of  locally  linearly  independent 
function  vectors  and  consider  the  ‘hole  problem’  more  closely.  In  the  second  section  we 
briefly  recall  a  characterization  of  local  linear  independence  for  function  vectors  in  terms 
of  the  mask  {A(k)}.  In  Section  3,  we  present  an  algorithm  for  computing  the  starting 
points  and  endpoints  of  the  support  of  the  components  </>u  of  3>. 

In  the  remaining  part  of  the  paper  we  restrict  ourselves  to  the  special  case  = 
(0i?02)T-  We  collect  some  observations  on  function  vectors  with  holes  in  Section  4  and 
show  that  holes  can  only  occur  in  special  situations.  In  Section  5  we  give  necessary 
conditions  for  local  linear  independence  in  terms  of  rank  conditions  for  matrices  formed 
by  the  mask  {A(k)}.  In  Section  6  we  prove  that  the  function  vector  $  given  in  Example 
4.1  is  continuous  and  locally  linearly  independent.  Finally,  we  summarize  our  findings  in 
the  conclusion.  However,  the  question  put  in  the  title  of  this  paper  cannot  be  answered 
completely.  We  conjecture  that  it  is  not  possible  to  have  locally  linearly  independent 
function  vectors  with  more  than  one  hole. 
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2  Preliminaries 

Let  us  start  with  some  notations.  For  a  compactly  supported,  continuous  function  (j)  : 
IR  — >  TR,  let  supp^  be  the  closed  subset  of  IR,  where  0  does  not  vanish.  Further,  let  the 
global  support  gsupp  0  be  the  smallest  interval  containing  supp  <j> .  The  function  4>  is  said 
to  have  a  hole  if  there  is  an  interval  I  which  is  a  subset  of  gsupp  <$>  of  Lebesgue  measure 
greater  than  zero,  where  <f>  is  identically  zero.  The  function  vector  is  said  to  contain  a 
hole  if  one  of  its  components  has  a  hole. 

For  a  characterization  of  locally  linearly  independent  function  vectors  we  briefly  recall 
the  result  of  Goodman,  Jia  and  Zhou  [4].  Let  satisfy  the  refinement  equation  (1.1), 
where  the  mask  matrices  A (k)  are  zero  matrices  for  k  <  0  and  for  k  >  N.  Considering 
the  vector 

#(x)  =  ($(x  + 

of  length  rN  and  the  ( rN  x  riV)-block  matrices 

Ao  =  (A(2k-l))^J0,  A1  =  (A(2k-l  +  l))^fJ0,  (2.1) 

the  refinement  equation  can  equivalently  be  written  as 

<&(®/2)  =  .Ao  &(x)  and  3>((z  +  l)/2)  =  A\  &(x),  a:  6  [0,1]. 

For  €i , . . . ,  en  G  {0,  1}  it  follows  that 

Now  let  vq  be  a  right  eigenvector  of  Ao  to  the  eigenvalue  1.  This  eigenvector  is  unique 
(up  to  multiplication  with  a  constant)  if  is  Testable  (see  [3]).  Let  V  be  the  minimal 
common  invariant  subspace  of  {Ao,  Ai }  generated  by  i?o.  Then  V  contains  the  vectors 
#(&),  x  G  [0,1),  since  4>(0)  =  cvq  with  some  constant  c  and  each  x  G  [0,1)  can  be 
represented  as  a  limit  of  a  sequence  of  dyadic  numbers  1/ 2n, !  6  2,  n  =  1,2,....  Further, 
let  M  be  an  (rN  x  dim  F)-matrix  such  that  the  columns  of  M  form  a  basis  of  V .  Then 
we  have  from  [4] 

Theorem  2.1  Let  be  a  refinable  vector  of  compactly  supported,  continuous  functions 
satisfying  (1.1)  with  A(k)  =  0  for  k  <  0  and  k  >  N.  Then  we  have 

(1)  $  is  linearly  independent  on  (0, 1)  if  and  only  if  all  nonzero  roxos  of  M  are  linearly 
:  independent. 

(2)  $  is  locally  linearly  independent  if  and  only  if  for  all  n  with  0  <  n  <  2rN  and  all 
ei, . . . ,  en  G  {0,  1}  the  nonzero  rows  of  Ae„  •  •  >AeiM  are  linearly  independent. 

Remark  2.2  A  similar  characterization  of  local  linear  independence  is  possible  also  for 
L1  -solutions  of  vector  refinement  equations  (LI)  and  even  for  distributions  (see  [2,  13]). 
Some  examples  of  locally  linearly  independent  function  vectors  can  be  found  in  [4,  10]. 

3  Global  support  of  $ 

Now  we  want  to  give  an  algorithm  for  computing  the  global  support  of  the  components 
of  refinable  function  vectors  $  from  the  mask.  To  this  end  let  us  assume  that  the  (r  x  r)- 
matrices  A(k)  in  (1.1)  are  of  the  form  A(k )  =  (Aij(k))^  j=1.  We  look  for  au ,  /3„  G  IR 


Locally  linearly  independent  function  vectors 

with  gsupp  (f>u  =  [a„,  pv\.  Let  for  all  pairs  i,j  =  1, . .  r, 

sid  :=  mm{k:Aij(k)^Q}, 

9i,j  '=  max{/c :  Aid(k)  ^  0}. 

Observe  that  Sij,  gi}j  are  integers.  The  numbers  av  can  be  found  by  the  following  al¬ 
gorithm.  ‘  ° 

Algorithm  3.1 

Input:  $i}j,  ij  = 

(1)  Let  p  :=  (pi, . . .  ,pr)  be  a  vector  of  length  r. 

For  v  from  1  to  r  do  au  :=  s^u;  pv  :=  v  enddo. 

(2)  For  v  from  1  to  r  do 

For  j  from  1  to  r  do 

if  suJ  <  2 OLy  -  aij  then  au  :=  (sU)j  +  &j)/2;  pv  :=  j  endif 
enddo 
enddo. 

(3)  Repeat  step  (2)  as  long  as  the  vector  p  -  {pu . . .  ,pr)  changes. 

(4)  Form  the  (r  x  r) -coefficient  matrix  P  with 

1  if  i=  j  and  i  =  p(z), 

p  .  _  2  if  i  -  j  and  i^p{i), 

-1  if  i^j  and  j  =  p(z), 

0  elsewhere , 

and  tte  vectors  a  :=  (a,, . . .  ,ar)T,  s  :=  (sliPl) . ..  ,Sr,Prf  and  sofoe  the  linear 
equation  system  Pa  —  s. 

Output:  a  =  (ai,..,  ,ar)r. 

Analogously  we  obtain  the  algorithm  for  the  endpoints  pv\ 

Algorithm  3.2 
Input:  giJf  ij  = 

(1)  Le£  p  :=  (pi, . . .  ,pr)  be  a  vector  of  length  r. 

For  v  from  1  to  r  do  pu  :=  pv  :=  v  enddo. 

(2)  For  v  from  1  to  r  do 

For  j  from  1  to  r  do 

if  Qvj  >  2 Pv  ~  Pj  then  Pv  :=  (guJ  +  /^)/2;  p^  :=  j  endif 
enddo 
enddo. 

(3)  Repeat  step  (2)  as  long  as  the  vector p  =  (pu...,pr)  changes. 

(4)  Form  the  (r  x  r)- coefficient  matrix  P  as  defined  in  Algorithm  3.1,  and  the  vectors 

b  •—  (A,  ■  ■  • ,  fir)T ,  g  :=  (fli.pi, .  • .  ,gr,Pr)T  and  solve  the  linear  equation  system 
Pb  =  g. 

Output:  b  :=  (/?!,...  ,/3r)T. 
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Proof: 


The  refinement  equation  (1.1)  implies  for  each  component  (j>u  that 


Qt/jx)  ==  y:  YA^k)^  k). 

keTLj- 1 

In  particular,  it  follows  from  the  local  linear  independence,  that  for  all  k  with  Avj(k)  ±  0, 

gsuppft(2  • -*;)  £  gsuppft,  v,  j  =  T  •  -  •  >rt 

that  is  [(ft  +  fc)/2,  (ft  +  k)/2]  C  [ft,  ft].  Using  the  numbers  sitj  and  g,j  defined  above, 
we  obtain  (ft  +  suJ)/2  >  a „  and  (ft  +  gvJ)/2  <  ft,  or  equivalently, 

2 ft  -  ft  <  sVtj  and  2ft  -  ft  >  ft.j  (3-1) 

for  all  v,  j  =  1, . . .  ,r.  In  particular,  for  each  fixed  i'  at  least  one  of  the  r  inequalities  in 
(3.1)  for  the  starting  points  (and  for  the  endpoints,  respectively)  must  be  an  equality. 

Let  us  look  to  the  first  algorithm  computing  the  starting  points,  the  second  works 
analogously.  In  the  first  step  of  the  algorithm  we  just  put  ft  :=  s„,„.  These  s„,„  are  upper 
bounds  of  the  true  starting  points  of  ft  since,  for  j  =  v,  (3.1)  implies  av  <  Hence 
it  is  clear  that,  if  2a„  -  a,  is  greater  than  for  a  fixed  v  and  some  j  €  {1, . . . ,  r>, 
then  av  must  be  reduced  since  aj  is  already  an  upper  bound  for  the  starting  point 
of  ft.  Putting  now  a,  :=  («„j  +  ft)/ 2  in  step  2,  we  obtain  again  an  upper  bound  of 
a„.  Repeating  the  second  step  of  the  algorithm  we  obtain  decreasing  sequences  for  a„ 
(being  dyadic  rationals,  and)  approaching  the  exact  starting  values.  However,  if  the  exact, 
starting  values  are  not  dyadic  rationals  then  they  cannot  be  obtained  by  a  finite  number 
of  repetitions  of  step  2.  That’s  why  we  consider  the  vector  p  which  stores  for  each  v  an 
index  j  =  ft  for  which  the  inequality  in  (3.1)  is  even  an  equality.  Then  step  2  must  on  y 
be  repeated  a  few  times  in  order  to  find  the  correct  vector  p.  Now,  we  can  use  the  r 

equalities 

2au  ~  Ctpu  =  Si /,jv 

in  order  to  compute  ft  directly.  By  a  suitable  rearranging  of  the  equations  one  obtains 
an  (r  x  r)-coefficient  matrix 

(Pi  0  0  ...  0  \ 

0  P2  0 


PK  0 
D 


where  Pi,  l  =  1, are  circulant  matrices  of  the  form 


2  -1  ...  0 
0  2-1 

-1 

-10  ...  2 
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D  is  a  diagonal  matrix  with  diagonal  elements  1  or  2,  and  R  is  a  matrix  of  dimension 
dim  D  x  (r  —  dimD),  with  one  nonvanishing  entry  in  each  row  at  most.  For  example,  in 
the  case  p  =  (1,2, ...  ,r),  P  is  just  the  (r  x  r)-identity  matrix,  i.e.,  dim D  =  r  and  the 
matrices  Pi  and  R  do  not  occur  in  P.  For  p  =  (2, 3, . . . ,  r,  1)  we  find  P  =  Pi  and  D  as 
well  as  R  vanish.  If  p  contains  smaller  ‘cycles’  of  the  form  (pni , . . .  ,pn/i)  with  pUj  =  nj+i, 
j  =  1, , . . ,  fi—  1  and  pn  =  ni,  then  each  cycle  corresponds  to  a  circulant  matrix  Pi  in  P. 
Since  the  circulant  matrices  Pi  are  invertible,  the  equation  system  is  uniquely  solvable. 
□ 

Example  3.3  Let  r  =  4  and  let  the  values  i,j  —  1, 2, 3, 4  be  given  by  the  matrix 


/i 

i 

i 

V3 


Algorithm  3.1  gives 

step  1:  aT  =  (ai,  <*2,023,04)  =  (1,1, 1,1)  and  p  =(1,2, 3, 4) 

step  2:  aT  =  (a  o2,  o3,  o4)  =  (1/2, 3/4, 3/4, 3/8)  and  p  =  (4, 1, 1, 2) 

step  3:  one  repetition  of  step  2: 

aT  =  (01,02,03,04)  =  (3/16, 19/32, 19/32, 19/64)  and  p  =  (4, 1, 1, 2) 
Since  p  did  not  change  no  further  repetition  of  step  2  is  necessary, 
step  4:  We  obtain 


P  = 


which  can  be  simply  changed  into  a  matrix  of  the  form  (3.2)  by  rearranging  the  equations 
for  the  vector  a'  —  (01, 04,  o2,  o3)T.  The  system  Pa  =  5  with  s  =  (0, 1,1, 0)T  gives 
a  =  (l/7,4/7,4/7,  2/7)t. 

Remark  3.4  In  [10]  it  has  been  shown  that  for  locally  linearly  independent  refinable 
function  vectors  =  (<£1, . . . ,  (f>r)T  the  starting  points  and  the  endpoints  of  gsupp<j>uy 
z/  =  1, . . . ,  r,  are  rational  numbers  of  the  form  k  +  cr,  where  k  eTL  and  cr  G  Jr  with 

k  '  '  '  0, . . . ,  (2*  -  l)2r-! 


f  2 

0 

0 

-1  \ 

f  -1 

2 

0 

0 

-1 

0 

2 

0 

0. 

-1 

0 

2  / 

:  {(2'- 


:  l  —  1, . . .  /c 


-4 


l)2r_i 

4  Function  vectors  with  holes 

In  contrast  with  the  scalar  case,  where  a  locally  linearly  independent  refinable  function 
cannot  have  a  hole,  for  function  vectors  this  need  no  longer  to  be  true. 

Example  4.1  Let  $>  =  ((/)  1,  ^2)T  satisfy 

$  1/3 

(?s) 


*  (1/3  l/S^+Of  103)4,<to-1)+(l/3  S)*(2*”2) 


+ 
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Fig.  1.  Locally  linearly  independent  function  vector  4>  =  (0 i,02)t  with  a 

hole. 

Hence  Ao  and  A\  in  (2.1)  are  (14  x  14)-matrices.  The  function  vector  is  uniquely 
determined  by  the  refinement  equation  (up  to  multiplication  by  a  constant).  Further, 
gsupp0i  =  [0,  3]  and  gsupp02  =  [0,  5],  and  02  possesses  a  hole  of  length  1,  namely 
(j)2(x)  =  0  for  x  e  (5/2,  7/2)  (cf.  Figure  1).  As  we  shall  show  in  Section  6,  $  is  continuous 
and  locally  linearly  independent. 

Further,  one  can  simply  find  function  vectors  $  with  infinitely  many  holes  (but  not 
being  locally  linearly  independent). 

Example  4.2  Let  $  =  (</>i ,  fa)7  with 

4>i{x)  =  .^01  (2a:)  +  <t>i(2x  -  1)  +  ^i(2x  -  2),  4>2(x)  =  i^2(2a;)  +  <?l>i(2x-4). 


Fig.  2.  Function  vector  <3>  =  (0i,02)t  with  infinitely  many  holes. 


Here  Ao,  Ai  in  (2.1)  are  (8  x  8)-matrices.  Observe  that  0i  is  just  the  hat  function 
with  supp0i  =  [0,2]  and  02  is  a  fractal  function  with  gsupp02  =  [0,  3],  formed  by 
infinitely  many  ‘hats1  of  support  length  2"J  ,  j  —  0, 1, . . and  with  infinitely  many  holes 
of  the  form  2““J(3/2, 2),  j  =  0, 1, . . .  (cf.  Figure  2).  Of  course,  this  function  vector  is  not 
locally  linearly  independent,  since  0i  is  refinable  by  itself  (see  also  the  proof  of  Theorem 
4.3). 

We  want  to  consider  the  support  properties  of  function  vectors  $  more  closely,  and 
investigate,  in  which  cases  the  components  of  4>  can  have  holes. 
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In  the  remaining  part  of  the  paper,  we  only  investigate  the  case  r  =  2,  i.e.,  = 

(4>ufa)T-  ■ 

Theorem  4.3  Let  $  =  (0i,02)t  be  cl  refinable,  locally  linearly  independent  vector  of 
compactly  supported ,  continuous  functions  with  gsupp  4>u  =  [au,  /3U]  and  letlu  =  fiu—av, 
v  —  1, 2,  be  the  lengths  of  the  global  supports  with  l\  <  l2.  Suppose  that  contains  holes. 
Then  we  have 

(1)  The  support  lengths  satisfy  l2/2  <l\  <l2. 

(2)  There  exist  compactly  supported,  continuous  functions  /i,  f2  such  that  (j>2  =  /1  +  /2 
and  the  vector  fu  f2)T  is  refinable. 

Proof:  Since  4>  contains  holes,  there  exists  an  open  interval  I  =  (71,  72)  of  greatest 
length  and  a  v  G  {1,2}  with  I  C  gsupp  where  <f>v  vanishes  on  I.  If  there  are  several 
intervals  of  greatest  length  (biggest  holes)  we  just  choose  one  of  them.  Refinability  implies 
for  x  €  I 

<j)u(x)  =  0  =  y^  AUii(k)  (pi (2x  -  k)  +  AU)2(k)  <£2(2#  -  k). 

k 

Since  is  locally  linearly  independent,  it  follows  that 

Au,i(k)  =  0  for  supp</>i(2  •  -k)  Dl  /  0, 

AUi2 (k)  =  0  for  supp<£2(2  •  -k)  D  7^0. 

The  choice  of  I  as  the  greatest  interval  now  implies  that  we  can  replace  supp  <j>v  by 
gsupp  (pry,  such  that 

Au,i(k)  =  0  for  271  -  Pi  <  k  <2y2  -a  1,  (  , 

A^2(k)  =0  for  27x  - /%  <  &  <  272  -  a2.  ,  ' 

Let  now  /1  <fvX\au, 71]  &nd  /2  •=  <t>vX[y 2,/3„]j  where  X[a,b)  denotes  the  characteristic 
function  of  the  interval  [a,  b ].  Then  (/)„  =  fi  -b  /2  and  from  refinability  and  from  (4.1)  it 
follows  that 

/i(x)  =  T2  Av,i{k)  <t>i{2x  -  k)  +  ^2  Ava(k)<t>2(2x-k), 

fc<27i-/3i  fc<27i-/32 

h{x)  =  Av,i(k)  4>i(2x  -  k)  +  AVt2(k)(f>2(2x —,k). 

k>2-y2—oii  fc>272— a2 

If  the  hole  I  were  in  (f\ ,  then  at  least  one  of  the  two  functions  f\ ,  /2  would  have  a 
global  support  length  less  than  Zj/2  and  hence  would  vanish  since  gsupp <£1  (2  '—k)  and 
gsupp <^i(2  •  —  k)  have  a  length  >  l\/2.  Thus  the  hole  must  be  in  02,  i.e.,  <f2  =  /1  4-  f2. 

For  £1  =  l2  we  obtain  a  contradiction,  since,  with  the  same  argument  as  before,  one 
of  the  two  functions  /1,  f2  vanishes.  Hence  l2>  l\.  In  this  case  (0i,  /1,  f2)T  is  obviously 
a  refinable  vector  of  continuous  functions. 

It  remains  to  show  that  l2/2>  li  leads  to  a  contradiction.  For  l2/2  >  l\,  (j>i  must  be 
refinable  by  itself,  since  gsupp  cj)2( 2 •  —  k)  cannot  be  contained  in  gsupp  (j>i  for  some  k  €  5Z. 
In  particular,  from  local  linear  independence  we  know  that  then  [ai,/?i]  is  an  integer 
interval  and  that  has  no  holes.  Further,  since  at  least  one  of  the  two  functions  /i,/2 
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has  a  global  support  length  less  than  Z2/2,  it  follows  that  this  function  is  representable 
by  0i(2  •  -&),  k  G  2Z,  only.  Without  loss  of  generality  let 

/i(*)='£  A2il(k)M2x-h),  xeJR.  (4.2) 

k<  27i-/?i 

Considering  #1  —  (0i(-  +  ft))*!.”1 ,  local  linear  independence  implies  that  the  space 
Vi  =  span{3>i (z)  :  x  G  [0, 1)}  has  full  dimension  fa.  Further,  we  consider 

(»f.  ((/i(*+fc))K;,j)T,  ((^(;+fc))^fei)T)T. 

(Here,  for  a:  €  IR,  \x\  denotes  the  greatest  integer  less  than  or  equal  to  x  and  \x] 
denotes  the  smallest  integer  greater  that  or  equal  to  x.)  Now,  choosing  a  matrix  M  of 
basis  vectors  of  the  space  V  =  span{4?(a;)  :  x  G  [0,1)},  then,  because  of  (4.2),  the  rows 
of  M  corresponding  to  f\  depend  on  the  first  fa  rows  (corresponding  to  0i).  However, 
not  all  /i-rows  can  be  zero  rows  since  fi  is  not  a  zero  function.  But  this  contradicts  the 
local  linear  independence  condition  by  Theorem  2.1.  O 

Corollary  4.4  Let  4>  =  (0i,02)t  be  a  refinable,  locally  linearly  independent  vector  of 
compactly  supported ,  continuous  functions  with  gsupp  <pu  —  [a„,  ft }  and  lu  =  ft  —  au, 
v  =  1,2.  Suppose  that  fa  <  fa.  Then  we  have:  If  fa  =  fa  or  fa  <  fa/2  then  0i,  0 2  do  not 
possess  holes. 

Lemma  4.5  Let  $  =  (0i,02)r  be  a  refinable,  locally  linearly  independent  vector  of 
compactly  supported,  continuous  functions.  Then  $  has  no  holes  that  start  or  end  with 
an  integer. 

Proof:  Suppose,  4>  has  a  hole  which  ends  with  an  integer.  Choose  a  hole  (71 , 72)  of  this 
type  with  biggest  length.  Without  loss  of  generality  assume  that  this  hole  is  in  0 2.  Then, 
at  least  in  a  small  right  neighborhood  of  0,  02(*  +  72)  is  representable  only  by  0i (2  •  +07) 
and  02(2  •  +02).  Recall  from  [10]  that  the  supports  gsupp 0i  =  [07,  ft],  gsupp 02  = 
[a2,  ft]  satisfy 

au  =  k  +  C2,  ft  =  /  +  c2 ,  /MG2Z,  c2e  {0,1/2, 1/3, 2/3}. 

Now,  if  both,  ai  and  a2  are  integers,  then  01(2  +  07),  02(2  +  o2),  02(3  +  72)  are  linearly 
dependent  in  some  suitable  interval  x  G  [0,e),  e  >  0,  since  they  can  be  represented  by 
the  two  functions  0i(2rc  +  07),  02(2.t  +  a2).  This  is  a  contradiction  to  the  local  linear 
independence.  If  only  one  au,  v  G  {1,2}  is  an  integer,  then  <j>v{x  +  o„)  and  02( x  +  72) 
are  representable  only  by  0„( 2x  +  au)  in  some  interval  x  G  [0,  e)  as  before  and  we  again 
obtain  a  contradiction.  If  neither  07  nor  a2  are  integers,  then  <j>2(x  +  72)  cannot  be 
represented  by  integer  translates  of  0^(2#),  v  —  1,2,  contradicting  the  refinability. 
Analogously,  the  contradiction  follows  for  holes  starting  with  an  integer.  □ 

Let  us  call  a  hole  (71,72)  in  $  biggest  hole  if  there  is  no  other  hole  in  4>  of  double 
size  of  the  form  (271  +  k ,  272  +  k)  with  some  k  G  ZL 

Lemma  4.6  Let  4>  =  (0i,02)r  be  a  refinable,  locally  linearly  independent  vector  of 
compactly  supported,  continuous  functions.  Then  there  is  at  most  one  biggest  hole  in  4>. 
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Proof:  Assume  that  4>  has  two  biggest  holes.  Let  again  l\,  I2  denote  the  lengths  of  the 
global  supports  of  <j)  1 ,  <j>2  and  suppose  that  l\  <  l2.  Then  0 1  cannot  have  a  biggest  hole  by 
Theorem  4.3.  Hence  the  two  holes  must  be  in  <\>2  and  we  get  a  partition  (j) 2  =  /1  4-/2  H-  h 
analogously  as  in  the  proof  of  Theorem  4.3  such  that  (gsupp  fy  )U(gsupp  /2)U(gsupp  fs)  C 
gsupp  (j)2.  Further,  by  refinability,  each  function  fy,  f2,fs  can  be  represented  by  4> i(2  • 
—A;),  (j) 2(2  •  —  k),  k  G  7L.  Moreover,  at  least  one  of  the  three  functions  /1,  f2i  fs  must 
contain  a  translate  of  </>2(2-),  otherwise  at  least  two  of  the  functions  /1,  f2l  fs  would  be 
linearly  dependent  in  a  suitable  interval  inside  the  starting  intervals,  since  <£i(2  \—k) 
either  starts  at  7L  4-  cki/2  or  at  7L  +  (a±  + 1)/2  (depending  on  whether  k  is  even  or  odd). 
Hence  gsupp <f2  >  (gsupp (j)2)/2  +  2(gsupp  <fii)/2.  But  this  contradicts  Corollary  4.4. 
□ 

Remark  4.7  All  results  in  this  section  can  be  generalized  to  r  >  2  and  to  L1  -integrable 
functions ,  if  the  characterization  of  local  linear  independence  in  [2]  is  used. 

5  Rank  conditions  for  matrices  formed  by  the  refinement  mask 

We  again  restrict  ourselves  to  the  case  that  $  =  fy>i,</>2)T  is  a  vector  of  compactly 
supported,  continuous  functions  satisfying  the  refinement  equation  (1.1)  with  A(k)  =  0 
for  k  <  0  and  k>N. 

Let  us  consider  the  matrices  Ao  and  A\  in  (2.1)  and  the  minimal  common  invariant 
subspace  V  of  {.Ao,  Ai}  generated  by  Uo  as  defined  in  Section  2.  Recall  that  V  contains 
&(x),  x  G  [0,  1).  Let  M  be  an  (rN  x  dim  F)-matrix  such  that  the  columns  of  M  form  a 
basis  of  V.  Now  delete  all  components  in  the  vector  =  (^(x~\~k))^~Q  corresponding  to 
zero  rows  in  M  in  order  to  get  <f>.  Further,  delete  the  corresponding  rows  and  columns 
in  the  matrices  Aq  and  Ai  in  (2.1)  in  order  to  obtain  A_o  and  A\  with 

$(x/2)  =  Ao&(x),  $((®  +  l)/2)  ~Ai&(x),  x  G  [0, 1].  (5.1) 

Deleting  the  zero  rows  and  the  corresponding  columns  in  M  we  obtain  M. 

Example  5.1  Let  us  consider  Example  4.1.  Here  is  a  vector  of  length  14  and  V  = 
span{$(x  +  k)l=0  :  x  G  [0,1)}.  Since  supp</>i  =  [0,  3]  and  supp</)2  C  [0,  5],  it  follows 
that  the  rows  of  M  corresponding  to  <j> i(x  +  j),  j  —  3, 4, 5, 6,  and  (j)2(x  +  j),  j  =  5, 6  are 
zero  rows.  Indeed,  these  are  all  zero  rows  of  M,  i.e.,  V  has  dimension  8.  We  delete  these 
components  of  &(x)  and  obtain 

$0)  =  (4>i(x),  <j>2{x),<l>i(x+l),(j>2{x  +  l),<f)i(x  +  2  ),<t>2{x  +  2),  <t>2(x  +  S),(f>2(x  +  4))t 
as  well  as 
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Let  us  call  a  row  of  Ao  (resp.  Ax)  0i-row  if  it  corresponds  to  an  0i -entry  in  and 
02-row  if  it  correspond  to  an  02-entry. 

Let  n  be  the  length  of  the  new  vector  3>  and  hence  An  A  are  (n  x  n)-matrices.  If 
is  a  locally  linearly  independent  vector  then  Theorem  2.1  implies  that  M  is  an  invertible 
(n  x  n)-matrix. 

Deleting  the  first  0i-row  and  the  first  02-row  and  the  corresponding  columns  in  An 
we  obtain  a  new  matrix  B  of  dimension  {n-2)  x  (n-2).  The  same  matrix  B  is  obtained,  if 
we  delete  the  last  0i-row  and  the  last  02-row  and  corresponding  columns  in  Ax .  Further, 
the  structure  of  An  A  implies  that 

spec  Ao  =  spec  Jo  U  spec  B ,  spec  Ax  =  spec  Ji  U  spec  B, 

where  Jq  (resp.  Ji)  is  a  2  x  2-matrix  containing  the  entries  of  Ao  (resp.  Ax)  being  at 
the  same  time  in  the  first  0 1-  or  02-row  (resp.  last  0i-  or  02-row)  and  in  the  first  0 1-  or 
02-column  (resp.  last  <fi\-  or  02-column).  (Here  spec  A  denotes  the  set  of  eigenvalues  of 
a  matrix  A .) 

Example  5.2  For  $  =  (01}  0 2)T  in  Example  5.1  we  obtain  the  matrix  B  after  deleting 
the  first  and  second  row  and  corresponding  columns  in  Ao  or  by  deleting  the  5th  and 
8th  row  and  corresponding  columns  in  A\.  Hence, 

/3  3  1  2  0  0\ 

9  0  3  3  0  0 

„  1  0  0  6  0  3  2  1  / 1  2  \  ,1/0  3\ 

*  9  003003  ’  °"9V3  3;’  1  9  1^9  oj 

0  0  0  0  0  0 

\9  0  0  0  0  0/ 

where  J\  and  J2  are  invertible. 

We  obtain 

Theorem  5.3  Let  $  =  (0i,02)t  be  a  refinable,  locally  linearly  independent  vector  of 
compactly  supported,  continuous  functions.  Further,  let  Aq,  A\  and  B  be  given  as  above. 
Then  we  have 

(1)  rank(Jo)  >  1  and  rank(Ji)  >  1, 

(2) .  rank(H)  >  n  -  3, 

(3)  rank(A)  >n-2  and  rank(A)  >  n  -  2, 

(4)  |  rank  (A)  —  rank(A)|  <  1- 

Proof:  (1)  First  observe  that  Jo  and  J\  at  least  have  rank  1,  otherwise  a  component 
of  #(#),  x  £  [0, 1)  would  completely  vanish,  contradicting  the  definition  of 

Let  gsupp 0i  =  [<*1,  fix]  and  gsupp02  =  [02,  &]■  Then,  one  simple  eigenvalue  zero  in 
Jo  implies  that  a\  £  7L,  02  £  7L  -f  1/2  or  vice  versa.  If  Jo  has  two  eigenvalues  0  then 
the  geometric  multiplicity  of  0  must  be  1  and  we  obtain  ax  £  2Z  + 1/3,  02  £  7L  +  2/3  or 
vice  versa.  Analogously,  a  corresponding  behavior  of  J\  implies  /?i  e  5Z  -f  1/2,  fi2  £7L 
or  vice  versa,  and  fix  £  7L  +  2/3,  fi2£7L  +  1/3  or  vice  versa,  respectively. 
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(2)  If  the  matrix  B  possesses  the  eigenvalue  zero,  then  both,  Ao  and  A\  possess  the 
eigenvalue  zero.  Hence,  AoM  and  AiA4_are  not  invertible,  while  M  is  an  invertible 
matrix.  Thus,  by  Theorem  2.1,  Ao  and  A\  have  a  zero  row,  but  being  not  the  first  or 
last  0i-  or  02-row.  Hence,  also  B  has  a  zero  row  and,  by  construction,  if  Ao  has  the 
zero  row  in  the  f-th  0^-row,  i  £  {1,2},  then  Ai  must  have  a  zero  row  in  the  ( l  —  1)- 
th  0^-row.  This  means  by  (5.1),  the  two  zero  rows  imply  a  hole  in  $  containing  the 
interval  (k  —  1/2,  k  +  1/2),  for  some  k  £  7L.  This  hole  must  be  a  biggest  hole.  If  B 
has  the  eigenvalue  zero  with  geometric  multiplicity  greater  than  1,  then  with  the  same 
arguments  one  obtains  a  second  biggest  hole  in  <£>.  But  this  contradicts  the  local  linear 
independence  by  Lemma  4.6.  Hence  rank(B)  >  n  —  3. . 

(3)  The  above  considerations  directly  imply  that  rank(Ao)  >  n  —  2  and  rank(Ai)  > 

n  —  2.  _  _ 

(4)  Now,  if  Ao  has  rank  n  —  2,  then  B  has  rank  n  —  3  and  hence  Ai  can  have  rank 

n  —  1  at  most.  Analogously,  rank(Ai)  =  n  —  2  implies  rank(Ao)  <  n  —  1.  □ 

Prom  Theorem  5.3  it  follows  that  we  have  to  investigate  the  following  five  cases: 

(1)  rank(Ao)  =  rank(Ai)  =  n, 

(2)  rank(Ao)  =  rank(Ai)  =  n  —  1, 

(3)  rank(Ao)  =  rank(Ai)  =  n  —  2, 

(4)  rank(Ao)  =  n  —  1,  rank(Ai)  =  n, 

(5)  rank(Ao)  —  n  —  1,  rank(Ai)  =  n  —  2. 

All  further  cases  can  be  reduced  to  one  of  the  above.  However,  some  of  these  cases  may 
contradict  the  local  linear  independence  assumption  for  <P 

Considering  the  first  two  cases,  we  obtain  a  partial  answer  to  the  question  of  whether 
the  support  of  0*,  i  =  1, 2,  can  have  holes.  Moreover,  we  obtain  sufficient  conditions  for 
the  local  linear  independence  of  in  terms  of  rank  conditions  for  Ao,  A\. 

For  the  first  case  we  obtain: 

Theorem  5.4  Let  $  =  (0i,  02)r  be  a  refinable  vector  of  compactly  supported ,  continu¬ 
ous  functions.  Let  the  space  V  —  span{<&(x)  :  x  £  [0, 1)}  have  full  dimension ,  i.e.  M, 
formed  by  basis  vectors  of  V  is  an  invertible  (n  x  n) -matrix.  Let  Ao,  Ai  be  given  as 
above.  Then  rank(*40)  —  rank(^4i)  —  n  implies  that  $  is  locally  linearly  independent  and 
has  no  holes. 

Proof:  The  assertion  on  local  linear  independence  is  already  proved  in  [4],  Theorem 
3.2.  Since  Ao,  A\  are  invertible,  the  matrix  ACl  •  ■  •  A€nM  never  has  a  zero  row,  hence 
from  . 

i(l+-'-+F  +  F)=lEi'"-Af"$(a:)’  a:e[0’1)’  (5-2) 

it  follows  that  there  is  no  dyadic  interval  where  0 1  or  02  vanishes.  Thus  $  has  no  holes. 

□ 

For  the  second  case  we  find 

Theorem  5.5  Let  $  =  (0i,  02)r  be  a  refinable  vector  of  compactly  supported ,  continu¬ 
ous  functions .  Let  the  space  V  —  span{$(x)  :  x  £  [0, 1)}  have  full  dimension,  i.e.  M, 
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formed  by  basis  vectors  ofV  is  an  invertible  ( n  x  n)-matrix.  Let  Ao ,  Ai  and  B  be  given 
as  above.  Further,  let  rank(v4o)  =  rank(*4i)  =  n  —  1  and  each  of  these  matrices  has  one 
zero  row.  Then  we  have 

(1)  If  rank(B)  —  n  -  2  and  the  four  matrices  AoAo ,  AqAi ,  AiAo,  A\A\  have  rank 
n  —  l,  then  $>  is  locally  linearly  independent  and  has  no  holes. 

(2)  //rank (5)  =  n— 3  and  the  four  matrices  AoAq ,  AoA\ ,  *4i*4o,  JlMi  have  rankn— 1, 
then  $  is  locally  linearly  independent  and  has  one  hole  of  the  form  (k  — 1/2,  k+ 1/2) 
/or  some  k  Z7L. 

Proof:  (1)  We  consider  the  first  case.  Since  rank(#)  =  n  —  2,  it  follows  that  B  is 
invertible  and  thezero  row  of  Ao  must  be  the  first  0i-row  or  the  first  02-row.  Analogously, 
the  zero  row  of  A\  must  be  the  last  <f>\-  or  02-row.  Since  rank(*4o*4o)  —  rank(^4i*Ai)  = 
n  -  1,  it  follows  that  Jo  and  J\  only  have  a  simple  eigenvalue^zero^and  the  assumptions 

(1)  of  the  theorem  imply  that  all  matrix  products  ACl  -"ACnM,  n  6  IN,  have  rank 

7i  —  1  and  one  zero  row,  namely  the  same  as  Ao  if  =  0  and  the  same  as  A\  if  e\  —  1. 
The  assumption  on  V  in  the  theorem  already  ensures  that  <f>  is  linearly  independent  on 
(0, 1).  Now  the  above  observations  also  imply  that,  by  Theorem  2.1,  <f>  is  locally  linearly 
independent.  _ 

The  zero  row  in  Ao  implies  that  the  support  of  one  component  of  4>  starts  with  an 
integer  and  the  support  of  the  other  with  a  half  integer.  Considering  the  zero  row  in  A\ 
we  also  find  that  the  support  of  one  component  ends  with  an  integer  and  the  support  of 
the  other  with  a  half  integer.  In  particular,  from  (5.2)  it  follows  that  $  cannot  have  holes. 

(2)  We  consider  the  second  case.  Since  rank(H)  =  n  —  3,  it  follows  that  B  possesses  the 

eigenvalue  zero  and  the  zero  rows  of  A)  and  A\  are  not  the  first  or  the  last  (j>i~  or  02- 
rows.  Moreover,  as  shown  in  the  proof  of  Theorem  5.3,  if  the  l-th  0*- row,  i  €  {1,2},  of 
Ao  is  a  zero  row  then  the  (l  —  l)-th  0*-row  of  A\  is  also  a  zero  row,  and  this  implies 
by  (5.1)  a  hole  of  the  form  (k  -  1/2,  k  +  1/2)  for  some  k  £  7L  in  0*.  Further,  the  rank 
conditions  (2)  of  the  theorem  imply  that  all  matrix  products  ACl  *  *  -ACnM,  n  E  IN,  have 
rank  n  —  1  and  either  a  zero  row  in  the  l-th.  or  in  the  (l  —  l)-th  row.  Thus,  by  Theorem 
2.1,  $  is  locally  linearly  independent  and  has  only  one  hole.  □ 

Remark  5.6  Example  4-1  satisfies  the  assumptions  of  Theorem  5.5  (2).  An  example 
satisfying  Theorem  5.5  (1)  can  be  found  in  [10]. 

Observe  that  the  case  (2)  is  not  completely  settled  by  Theorem  5.5  since  for  rank(^40)  = 
rank(^li)  =  n  —  l  some  of  the  four  matrices  *4oA),  AqAi,  A\Ao,  A\A\  can  also  have 
rank  n  -  2.  Indeed,  there  exist  locally  linearly  independent  function  vectors,  where 
rank(*A0)  =  rank^i)  =  n  -  1  and  rank(*40A))  =  rank(^l1^i)  =  77  —  2,  see  [10].  The 
remaining  cases  are  more  complicated  to  handle  and  we  cannot  give  a  final  answer  to 
the  question  of  whether  a  locally  linearly  independent  refinable  vector  $  can  have  more 
than  one  hole. 

6  Proof  of  the  example 

In  this  section  we  want  to  verify  the  assertion  that  the  function  vector  $  given  by  the 
refinement  mask  in  Example  4.1  is  continuous  and  locally  linearly  independent.  Let  us 
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first  prove  that  is  continuous.  To  this  end  we  use  the  following  observation  by  Jia, 
Riemenschneider  and  Zhou  [9]: 

Let  {^4(&)}fcLo  be  a  real  refinement  mask  satisfying  the  following  properties: 

(1)  \  J2k=o  has  one  eigenvalue  1  and  all  further  eigenvalues  are  inside  the  unit 
circle. 

(2)  The  matrices  Ao  and  Ai  both  have  the  simple  eigenvalue  1  and  there  is  a  vector 
e\  €  JRNr  with  ef  Ao  —  ej  A\  =  ef. 

(3)  Considering  the  space  U  =  {u  e  HriV  :  efu  =  0}  the  joint  spectral  radius  of  Ao\u 
and  A\\u  satisfies  p(Aq\u  Aq\u)  <  1* 

Then  the  subdivision  scheme  associated  with  {A(k)}^0  converges  in  the  maximum 
norm,  and  hence  the  solution  vector  $  of  the  refinement  equation  is  continuous. 

Here  the  joint  spectral  radius  satisfies  for  any  matrix  norm 

p{MvMu)  =  inf  (max{||./4ei|u  •  •  •  |c/||  :  €  {0, 1},  i  =  1, . . . ,n})1/n. 

n>  1 

For  our  example  we  find: 

!)  hJ2A(k)  = 

k= 0 

2)  The  matrices  Ao  and  Ai  both  have  the  simple  eigenvalue  1  with  the  left  eigenvector 
ef  =  (3,1, 3, 1,3, 1,3, 1,3, 1,3, 1,3,1). 

3)  The  space  U  =  {u  £  ]R14  :  ef  u  =  0}  has  dimension  13  and  we  find  the  orthonormal 
basis  of  U: 

Ui  =  28-1/2  (4, 0, 0,  0,  —3,  —1, 0, 0, 0,  —1, 0, 0, 0,  — 1)T, 
u2  =  HO-1/2  (0, 0, 0, 0,  —3,  —1, 0, 0, 0, 0, 0, 0, 0, 10)r, 
us  =  13(T1/2  (-3, 0,0, 0,-3,  -1,-3, 0,0, -1,0,0, 10, -1)T,' 

U4  =  132~1/'2  (0, 0,0,0, —3, —1,0, 0,0, 11,0,0, 0,  —l)T, 
u5  =  70-V2  (—3,0,0, 0,  —3,  -1, 7,0,0, -1,0,0,0, -1)^, 
u6  =  208~1/2  (-3, 0,0, 0,-3, -1,-3, 0,0,  —1,13,0,  -3,  -1)T, 
m7  =  3540-1^2  (—3,  —1,  —3,0,  —3,  —1,  —3,59,0,  —1,  —3,  —1,  —3,  — 1)T, 
m8  =  3660-1/2  (—3,  —1,  —3,60,  —3,  —1,  —3,  —1,0,  —1,  —3,  —1,  —3,  — 1)T, 

Ug  =  2352-1/2  (—3,48,0,0,  —3,  —1,  —3, 0,0,  —1,  —3,0,  —3,  — 1)T, 

«io  =  3422-1/2  (—3,  —1,  —3,0,  —3,  —1,  —3,0, 0,  -1,  —3, 58,  —3,  — 1)T, 
uii  =  4270_1/2  (—9,  —3,  —9,  —3,  —9,  —3,  —9,  —3, 61,  —3,  —9,  —3,  —9,  — 3)T, 

«12  =  10-1/2(0,0,0,0,1,-3,0,0,0,0,0,0,0,0)t, 

mi 3  =  2842"1/2  (-9,  -3, 49, 0,  -9,  -3,  -9, 0, 0,  -3,  -9, 0,  -9,  -3)T. 

The  matrix  representations  of  Alt;,  Ai \u  under  this  basis  are  A0\u  ~  ((*4o  Uj)T  uk)^kz=:1 
and  Ai\u  =  ((Ai  Uj)T  uk)jf.=v  and  a  computation  with  Maple  gives  for  the  spectral 
norm 

:  ei,e2,e3  e  {0,  1}})1/3  <  0.95. 


possesses  the  eigenvalues  1  and  -5/18. 
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Hence  4>  is  continuous. 

Let  us  prove  the  local  linear  independence  of  4>.  Here  we  use  Theorem  2.1  and  a 
procedure  proposed  by  Goodman,  Jia  and  Zhou  [4].  The  space  V  C  IR14  (as  given  in 
Section  2)  is  spanned  by  the  vector  Vo  =  (0,0,9/5,38/15,6/5, 1,0,0, 0,9/5, 0,0,0, 0)T 
and  by  A\Vq,  .4o*4iu0,  A\A\Vq,  v4o-4o*4i^o>  AiAqAiVo,  AoAoAoAiVq,  : 

Here  Vq  is  a  right  eigenvector  of  Aq  to  the  eigenvalue  1.  Hence  dimF  =  8.  Forming  the 
matrix  M,  we  observe  that  the  7-th,  the  9-th  and  the  last  four  rows  of  M  are  zero  rows. 
Hence  gsupp^i  =  [0,3]  and  gsupp02  =  [0,5].  The  remaining  8  rows  of  M  are  linearly 
independent.  Thus  $  is  linearly  independent  on  (0, 1)  by  Theorem  2.1. 

We  can  restrict  our  considerations  to  the  shortened  matrices  Ao,  A\  as  given  in 
Example  5.1.  Further,  we  can  choose  the  matrix  M  as  the  identity  matrix.  The  procedure 
proposed  in  [4]  gives  rank.Ao  =  rank.4o.Ao  —  rank.4o.4i  =  7  and  the  7-th  rows  are  zero; 
rank  A\  —  rank.4i.4i  =  rank  A\  Aq  —  7  and  the  6-th  rows  are  zero. 

Hence,  $  is  locally  linearly  independent.  Moreover,  02  possesses  a  hole  of  length  1, 
namely  </>2(x)  =  0  for  x  e  (5/2,  7/2). 

7  Conclusions 

In  Section  3  we  have  presented  an  algorithm  to  compute  the  global  supports  of  the  r 
components  of  a  compactly  supported  refinable  function  vector  from  the  refinement 
mask.  The  rest  of  the  paper  was  restricted  to  r  =  2. 

While  for  the  scalar  case  local  linear  independence  of  a  refinable  function  (p  guarantees 
that  the  support  of  <p  is  an  integer  interval  without  holes,  this  is  not  longer  the  case  for 
r  >  1.  As  we  have  seen  in  Section  4,  a  function  vector  $  =  (0i,  <p2)T  can  only  have  holes 
if  the  lengths  l\  and  l2  of  the  global  supports  of  </)3 , (p2  satisfy  l2f 2  <  l\  <l2.  As  another 
property,  it  has  been  shown  that  the  endpoints  of  a  hole  cannot  be  integers.  Further, 
can  have  at  most  one  biggest  hole. 

In  Section  5  we  have  investigated  matrices  derived  from  the  refinement  mask.  In 
Theorem  5.3  some  results  on  the  rank  of  these  matrices  are  obtained  leaving  five  different 
cases  to  be  investigated.  The  first  case  has  been  solved  completely  in  Theorem  5.4.  The 
second  case  has  been  settled  partiallyjn  Theorem  5.5.  For  the  other  cases  we  cannot  give 
a  final  answer.  However,  if  Ao  and  Ai  have  different  rank  (as  in  case  (4)  and  case  (5)) 
then  one  can  show  by  Theorem  2.1  that  4>  must  have  infinitely  many  holes.  In  case  (4) 
this  can  be  seen  as  follows.  Since  rank(.Ao)  =  n  —  1  it  follows  that  rank(^tjvlo)  —  n—  1  for 

k  =  0, 1, _ Hence,  by  Theorem  2.1,  .43.4o  has  a  zero  row  for  all  k  =  0, 1, . . .  implying 

that  $  contains  vanishing  intervals  of  the  form  (Ik  +  (2fc  —  l)/2fc,  4  4*  (2fc  — 1/2)/2* )  with 
suitable  integers  Here  Ik  cannot  be  the  same  integer  for  all  k  =  0, 1, 2, . . in  particular 
one  finds  Ik  ^  h+u  k  €  IN.  Hence  $  has  infinitely  many  holes.  This  observation  leads  to 
the  following 

Conjecture  7.1  Let  $  =  (cp\  ,<^2)r  be  a  refinable,  locally  linearly  independent  vector 
of  compactly  supported ,  continuous  functions.  Then  $  cannot  have  more  than  one  but 
finitely  many  holes . 

Our  numerical  computations  however  lead  to  the  hypothesis  that  the  cases  (3),  (4)  and 
(5)  contradict  the  property  of  local  linear  independence.  So  we  obtain 
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Conjecture  7.2  Let  $  =  (0i,</>2)t  be  a  refinable,  locally  linearly  independent  vector  of 

compactly  supported ,  continuous  functions .  Then  $  cannot  have  infinitely  many  holes. 

Acknowledgment  The  author  thanks  the  referees  for  their  valuable  suggestions  to 
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Abstract 

We  consider  the  univariate  two-scale  refinement  equation  ip{x)  —  J2k= oCktP(^x  “ 
where  Co,  *  •  • ,  cat  are  complex  values  and  =  2.  This  paper  analyses  the  correlation 
between  the  existence  of  smooth  compactly  supported  solutions  of  this  equation  and  the 
convergence  of  the  corresponding  cascade  algorithm/subdivision  scheme.  In  the  work  [11] 

We  have  introduced  a  criterion  that  expresses  this  correlation  in  terms  of  the  mask  of  the 
equation.  It  is  shown  that  the  convergence  of  subdivision  scheme  depends  on  values  that 
the  mask  takes  at  the  points  of  its  generalized  cycles.  In  this  paper  we  show  that  the 
criterion  is  sharp  in  the  sense  that  an  arbitrary  generalized  cycle  causes  the  divergence 
of  a  suitable  subdivision  scheme.  To  do  this  we  construct  a  general  method  to  produce 
divergent  subdivision  schemes  having  smooth  refinable  functions.  The  criterion  therefore 
establishes  a  complete  classification  of  divergent  subdivision  schemes. 

1  Introduction 

Refinement  equations  have  been  studied  by  many  authors  in  great  detail  in  connection 
with  their  role  in  the  theory  of  wavelets  and  of  subdivision  schemes  in  approximation 
theory  and  design  of  curves  and  surfaces  (see  [1-14]).  In  this  paper  we  study  a  criterion 
of  convergence  of  subdivision  processes  having  smooth  refinable  functions.  This  criterion 
was  presented  in  the  work  [11].  In  particular  we  show  that  the  criterion  is  sharp  in 
the  sense  that  each  if  its  cases  is  realized.  To  do  this  we  provide  a  general  procedure 
for  constructing  divergent  subdivision  schemes  (or  cascade  algorithms)  corresponding  to 
smooth  refinable  functions. 

We  restrict  ourselves  to  univariate  equations  with  a  compactly  supported  mask. 
Through  the  paper  we  denote  by  T  =  R/27rZ  the  unit  circle,  by  7 i  the  space  of  en¬ 
tire  functions  on  C,  by  Cl  the  space  of  l  times  continuously  differentiable  functions  on 
R,  by  C°  =  C  the  space  of  continuous  functions,  by  Cl0  the  space  of  compactly  supported 
functions  from  Cl,  and  by  Co  the  space  of  compactly  supported  continuous  functions  on 
R.  A  sequence  {fk}  converges  to  zero  in  Cl0  if  it  converges  to  zero  in  Cl  and  the  supports 
of  /fc,  k  €  N  are  uniformly  bounded. 

Consider  a  refinement  equation 

N 

<p(x)  =  “  *)»  (L1) 

k=0 
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where  ck  €  C,  Y,kck  =  2-  The  trigonometric  polynomial  m(£)  —  lY^k=ocke~ik^  is 
the  mask  of  this  equation.  It  is  well  known  that  a  Co-solution  of  this  equation  ( refinable 
function ),  if  it  exists  at  all,  is  unique  up  to  normalization  and  has  its  support  on  the 
segment  [0,7V].  For  a  given  mask  m  we  denote  by  [m]  the  corresponding  refinement 
equation.  Let  us  also  define  the  following  subspaces  of  the  space  Co: 

Ml  =  {/  €  Co  |  f(0(  1  -  e-*)-*-1  €  «},  =  {/  g  c‘  I  F>  e  M1},  l>  0. 

In  other  words  the  Fourier  transform  of  a  function  from  Ml  has  zeros  of  order  >  l  +  1 
at  all  the  points  2tt k,  k  £  Z.  The  Fourier  transform  of  a  function  from  £}  has  zero  at 
the  point  £  =  0  and  has  zeros  of  order  >  l  +  1  at  all  the  points  2ir k,  k  e  Z  \  {0}.  Let  us 
also  denote  £  =  Cfi  —  At0. 

The  cascade  algorithm  for  refinement  equations  is  the  construction  of  the  sequence 
fn  =  Tf n-i  for  some  initial  function  f0  e  C0,  where  Tf(x)  =  £fecfe/( 2x  -  k)  is  the 
subdivision  operator  associated  to  equation  (1.1).  This  operator  is  defined  on  the  space 
Co  and  preserves  all  the  subspaces  Cz,  £l.  If  fn  converges  in  the  space  Cq  to  a  function 
<P  €  Co  (l  >  0),  then  obviously  it  converges  in  Cl0  and  ip  is  the  solution  of  (1.1).  Moreover, 
in  that  case  the  function  g  =  fo-ip  necessarily  belongs  to  £l  (see  [1],  [5]).  Thus  we  say 
that  the  cascade  algorithm  converges  in  Cl  if  Tng  — ►  0,  n  — ►  oo  for  any  g  e  £l.  Properties 
of  the  cascade  algorithms  have  been  studied  by  many  authors  in  various  contexts.  This 
algorithm  gives  a  simple  way  for  approximation  of  refinable  functions  and  wavelets.  On 
the  other  hand  the  convergence  of  the  cascade  algorithm  is  equivalent  to  the  convergence 
of  the  corresponding  subdivision  scheme  ([4]).  For  a  given  mask  m(£)  we  say  that  the 
subdivision  process  {m}  converges  in  Cl  if  the  corresponding  cascade  algorithm  or  the 
corresponding  subdivision  scheme  converges  in  that  space. 

It  is  clear  that  if  a  subdivision  process  converges  in  then  the  corresponding  re¬ 
finement  equation  has  a  (^-solution.  In  general  the  converse  is  not  true,  corresponding 
examples  are  well-known  (see  [1],  [2],  [13]  for  general  discussions  of  this  aspect).  A  nat¬ 
ural  question  arises;  under  which  extra  conditions  the  solvability  of  a  refinement  equation 
implies  the  convergence  of  the  subdivision  process? 

1)  A  necessary  condition  (first  introduced  in  [6]): 

If  a  subdivision  process  {m}  converges  in  Cl,  then  its  mask  can  be  factored  as 

/ 1  j_  N  z+i 

m($)=(-t^)  0(0  (1.2) 

for  some  trigonometric  polynomial  a(£).  In  particular  the  condition 

^ )o(0  ^  ]Tc2fc  =  Y,c^+1  =  1  '  (1.3) 

kk 

is  necessary  for  the  convergence  of  the  subdivision  process  in  C .  Let  us  remember  that  for 
the  existence  of  smooth  solutions  of  refinement  equation  this  condition  is  not  necessary 
(there  is  a  weaker  condition  for  this,  see  [10]). 

For  a  given  mask  m  denote  by  l(m)  the  maximal  integer  l  such  that  condition  (1.2) 
is  satisfied.  So  if  a  subdivision  process  {m}  converges  in  Ck ,  then  k  <  l(m). 

2)  A  sufficient  condition  (introduced  in  [1],  developed  in  [8] ,[14], [7], [9]): 
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Suppose  a  mask  m  satisfying  1.2  for  some  l  >  0  has  neither  symmetric  roots  nor  cycles; 
then  if  the  equation  [m]  has  a  Cl0-solution,  then  the  process  {m}  converges  in  Ci. 

Let  us  recall  the  notation  used  in  this  statement.  If,  for  a  trigonometric  polynomial 
p(£)  and  for  some  a  e  T,  we  have  p{a[ 2)  =  p(7T  4-  a/  2)  =  0,  then  {a/2,7r  +  a/2}  is  a 
pair  of  symmetric  roots  for  p(£).  In  order  to  be  defined  we  set  that  for  any  a  €  T  the 
element  a/2  £  T  has  the  corresponding  real  value  from  the  half-interval  [0, 7r).  Further, 
a  given  set  b  =  {fiu  •  •  *  ,/?n}  C  T,  where  n  >  2,  is  called  cyclic  if  2b  =  b,  i.e.,  2 fij  =  fij+i 
for  j  =  1,  *  *  • , n  (we  set  /3n+i  ='  fix).  We  consider  only  irreducible  cyclic  sets,  for  which 
all  the  elements  are  different.  Note  that  if  two  cyclic  sets  do  not  coincide,  then  they  are 
disjoint.  A  cyclic  set  b  is  called  a  cycle  of  a  trigonometric  polynomial  p  if  p( b  +  7r)  =  0, 
i.e.,  p(fi  +  7r)  =  0  for  all  fie  b. 

It  is  well  known  that  the  sufficient  condition  (2)  for  a  mask  m  is  equivalent  to  the 
stability  of  the  corresponding  refinable  function  (i.e.,  integer  translates  of  the  refinable 
function  possess  Riesz  basis  property  in  L2(K))-  It  is  also  equivalent  to  say  that  the  mask 
satisfies  Cohen’s  criterion  (see  for  example  [5,  Proposition  2.4]).  Actually  condition  (2) 
was  formulated  for  the  case  l  =  0  only,  but  it  can  be  easily  extended  to  general  l.  It  is 
seen,  for  instance,  from  Theorem  2.2  of  this  paper. 

Thus  we  have  one  necessary  and  one  sufficient  condition  for  the  convergence  of  sub¬ 
division  processes  having  smooth  refinable  functions.  It  was  a  natural  problem  to  fill  this 
gap  and  to  elaborate  a  criterion  in  terms  “if  and  only  if”.  In  1998  two  attempts  were 
made  independently  from  each  other  and  almost  simultaneously.  They  were  the  work 
[9]  by  M.  Neamtu  and  my  work  [11].  Those  two  criteria  were  very  similar,  but  different. 
Moreover,  it  turned  out  that  our  results  were  actually  incompatible.  We  will  discuss  this 
aspect  after  formulating  the  main  result  of  the  work  [11]. 

2  A  criterion  for  convergence 

We  give  a  criterion  of  convergence  of  a  subdivision  process  under  the  condition  that  the 
corresponding  refinement  equation  has  a  smooth  solution.  We  will  see  that  symmetric 
roots  of  mask  do  not  influence  the  convergence  of  subdivision  processes.  This  means 
in  particular  that  the  stability  of  solutions  is  not  necessary  for  the  convergence.  The 
convergence  entirely  depends  on  values  of  the  mask  at  the  points  of  so-called  generalized 
cycles. 

Everywhere  below  we  consider  trigonometric  polynomials  without  positive  powers, 
i.e.,  polynomials  of  the  form  p(£)  =  ]Cfc=o  As  usual  we  set  deg  p  —  N  (assuming 

a0aN  ^  0).  To  a  given  value  a  e  T  we  assign  a  binary  tree  denoted  in  the  sequel  by  Ta. 
To  every  vertex  of  this  tree  we  associate  a  value  from  T  as  follows:  put  a  at  the  root, 
then  put  a/2  and  7r  +  a/2  at  the  vertices  of  the  first  level  (the  level  of  the  vertex  is  the 
distance  from  this  vertex  to  the  root.  The  root  has  level  0).  If  a  value  7  is  associated  to  a 
vertex  on  the  n-th  level,  then  the  values  7/2  and  7r  +  7/2  are  associated  to  its  neighbors 
on  the  (n  +  l)-st  level.  Thus  there  are  the  values  ^  ,  k  =  0,  •  •  • ,  2n  -  1  on  the 

n-th  level  of  the  tree  Ta.  A  set  of  vertices  A  of  the  tree  Ta  is  called  a  minimal  cut  set  if 
every  infinite  path  (all  the  paths  are  without  backtracking)  starting  at  the  root  includes 
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exactly  one  element  of  A.  For  instance  the  one-element  set  A  =  {root}  is  a  minimal  cut 
set.  Every  minimal  cut  set  is  finite 

2I1  A  S,6t  c  T  is  called  «  generalized,  cycle  of  a  polynomial 

p(£)  if  this  set  is  cyclic  and  for  any  j  =  1, .  •  • ,  n  the  tree  Tft+7r  possesses  a  minimal  cut 
set  Aj  such  that  p(Aj)  =  0. 

The  family  {Ai,---  ,An}  is  said  to  be  sets  of  zeros  of  the  generalized  cycle  b.  Let  us 
remark  that  for  a  given  generalized  cycle  the  set  of  zeros  may  not  be  defined  in  a  unique 
way.  ny  (regular)  cycle  of  p(()  is  also  a  generalized  cycle,  in  this  simplest  case  each 
minimal  cut  set  Aj  is  the  root  of  the  corresponding  tree  Tp.+W.  On  the  other  hand  not 
any^generahzed^cycle  is  a  regular  cycle.  For  example,  the  polynomial  p(£)  =  (e-«  _ 

/o  3)o€a  7 1~  haS  n°  regular  cycles’  but  is  has  a  generalized  cycle  b  =  {fa,  fa}  = 
{27r/3,47r/3}.  Indeed,  this  polynomial  has  three  zeros  on  the  period:  7t/3,  -tt/6,  5tt/6  € 

■  set  =  {*"7r/6J  5ir/6}  is  a  minimal  cut  set  for  the  point  fa  +  tt,  A2  =  ivr /3>  is 
a  minimal  cut  set  for  fa  +  *,  and  p(A)  =  p(A2)  =  0.  Roughly  speaking,  each  cyclic  set 
fPi>  •  •  • ,  Pn\  has  a  unique  corresponding  cycle  (the  family  of  zeros  is  {fa+ir  B  +7r}) 
and  a  variety  of  generalized  cycles  (all  possible  sets  of  zeros  {Au . . . ,  An},  where  X  is  an 
arbitrary  minimal  cut  set  of  the  tree  T0j+7r,  j  =  1, . . . ,  „).  Note,  that  if  at  least  one  set  ' 

^  ,  fr0™. tbe  root  Pi  +  7r>  then  lf;  necessarily  contains  a  pair  of  symmetric  roots  of 

?!  therefore,  if  the  polynomial  p  has  no  symmetric  roots,  then  all  its  generalized  cycles 
it  there  are  any,  are  regular  cycles.  ’ 

For  any  trigonometric  polynomial  p  and  any  finite  subset  Y  =  {«i  •••  a  )  c  T 

we  denote  pp(Y)  =  (f[”=1  \p(aq)\)1/n.  This  is  a  multiplicative  function’ on  ’the  set  of 
trigonometric  polynomials. 

Now  we  formulate  the  criterion  of  stability  of  subdivision  process. 

Theorem  2.2  Suppose  a  refinement  equation  [m]  has  a  Cl0-solution  for  some  l  >  0- 
then  the  process  {m}  converges  in  Cl  if  and  only  if  the  mask  m  satisfies  (1.2)  and  for 
any  generalized  cycle  b  of  the  mask  m  we  have  pm(b)  <  2~l . 

In  particular,  for  l  =  0,  this  means  that  a  subdivision  process  {m},  whose  refinement 
equation  has  a  continuous  solution,  converges  if  and  only  if  pm{ b)  <  1  for  every  general¬ 
ized  cycle  b  of  the  mask.  Another  corollary  is  Condition  (2)  from  the  Section  1.  Indeed 
rfa  mask  has  neither  symmetric  roots  nor  cycles,  then  it  has  no  generalized  cycles  either.’ 
Hence,  by  Theorem  2.2,  the  subdivision  process  must  converge. 

Example  2.3  Consider  a  mask 

m(e)  =  (0.2  + O.fie-^ +  0.3e-2i«)(e-i«-e-¥)2(e-2i«-e^)2  (2.1) 

The  corresponding  equation  [to]  has  a  Co-solution,  this  is  shown  in  Example  4  5  The 
polynomial  to  has  a  unique  generalized  cycle  b  =  {2tt/3,  4vr/3>,  the  same  as  in  the 
previous  example,  with  the  same  sets  of  zeros  Ai  =  {-tt/6,  5tt/6},  A2  =  {tt/3}.  Actually 

his  is  not  or|e,  but  two  coinciding  generalized  cycles,  if  we  count  roots  with  multiplicity. 
We  have  (pm(b))2  = 

3  )  ’  =  (~0.2  —  0.1\/3i)  •  1  •  1  .  (— 0.2  +  0.1\/3i)  Ae^  Ae**^  =  1.12  >  1. 
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Hence  the  subdivision  process  {m}  diverges. 

3  Statement  of  the  problem 

Most  examples  of  divergent  subdivision  schemes  (having  smooth  refinable  functions)  are 
constructed  for  some  special  class  of  masks.  These  are  either  “unload  masks  of  the  form 
m(£)  =  p(n£)  for  some  polynomial  p  and  an  odd  integer  n,  or,  at  least,  masks  w  ose 

associated  matrix  B  =  {<*-*}«€{ o . n]  have  a  multiple  eigenvalue  1.  The  divergence 

of  such  schemes  is  well  known  and  does  not  require  any  special  criterion.  A  natural  ques¬ 
tion  arises;  whether  one  really  needs  the  criterion  of  Theorem  2.2  to  determine  divergent 
processes?  Maybe  the  family  of  generalized  cycles  is  too  wide  to  describe  unstable  su  - 
division  schemes.  In  general  there  is  no  evidence  that  the  condition  p,„(b)  >  1  can  e 
combined  with  the  existence  of  a  smooth  solution  for  the  mask  m.  In  this  paper  we  are 
going  to  show  that  Theorem  2.2  indeed  characterizes  the  family  of  unstable  subdivision 
processes  properly.  We  show  that  each  generalized  cycle  can  cause  the  divergence  of  a 
suitable  scheme.  On  the  other  hand,  we  will  see  that  every  converging  subdivision  scheme 
can  be  “spoiled”  by  some  generalized  cycle. 

4  Preliminary  results.  Reductions  of  masks 

To  construct  examples  of  divergent  processes  we  need  some  auxiliary  results.  The  first  of 
them  establishes  two  properties  of  cyclic  sets.  The  proof  of  this  lemma  is  an  easy  exercise 

for  the  reader. 

Lemma  4.1  a)  Let  b  be  a  cyclic  set  and  a  €  T.  Then  for  the  polynomials  px(0  = 
e-H  -  e~ia  and p2(0  =  e~2il=  -  e~ia  we  have  prn (b)  =  pp?( b) .  ,,  ; ' 

b )  Let  bi  and  b2  be  cyclic  sets  and p(£)  =  n^eb,  (e  * +e  '  )■  Then  we  have:  pp{  2)- 
if  hi  7^  b2,  a,nd  pp( b2)  =  2  if  hi  =  b2. 

Now  turn  back  to  the  subdivision  schemes.  For  a  given  integer  1  >  0,  a  mask  m, 
and  a  function  /  €  £*,  denote  =  -  limn^oo  log2  l|Tn[/(]Jc/n,  where  T  is 

the  subdivision  operator  associated  to  m  (we  set  log20  -  oo).  The  value  vt(m) 
inf vAm,  f)  is  the  degree  of  convergence  of  the  process  {m}  m  the  space  L  .  _ 

For  every  mask  m  we  have  i g(m)  <1  +  1  (see  [3]).  Furthermore,  it  was  shown  m  [3] 
and  [2]  that  a  process  {m}  converges  in  Cl  if  and  only  if  >  l.  In  particular 

inequality  Mm)  >  0  means  that  {m}  converges  in  C.  Let  L  be  the  maximal  integei 
such  that  {m}  converges  in  CL  (if  the  process  {m}  does  not  converge  in  C,  then  we 
nevertheless  set  L  =  0).  The  value  uL{m)  is  said  to  be  the  degree  of  convergence  of  the 
process  { m }  and  denoted  in  the  sequel  by  v{m).  If  v{mi)  =  u{m2),  then  i)  - 

For  a  given  refinement  equation  [m]  denote  by  L(m)  the  maximal  integer  L  such  that 
the  corresponding  refinable  function  <p  belongs  to  C&.  If  this  equation  has  no  continu¬ 
ous  compactly-supported  solution,  we  set  L(m)  =  -1.  The  smoothness  of  the  refinable 
function  v?  is  the  value  s(rn)  =  L  +  h,  where  h  is  the  Holder  exponent  of  the  Lth  deriv¬ 
ative  w(L)  on  R.  It  is  well  known  that  a  refinable  function  belongs  to  C  if  and  only  i 
s(m)  >  l  (the  equality  s(m)  =  i  is  impossible).  In  particular,  a  refinement  equation  has 

a  Co-solution  if  and  only  if  s(m)  >  0. 
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Now  we  can  describe  the  procedure  of  reduction  of  subdivision  schemes  introduced 
in  [11].  This  reduction  makes  it  possible  to  get  rid  of  both  symmetric  roots  and  cycles. 

4.1  Eliminating  of  symmetric  roots 

Let  p(£)  be  a  given  trigonometric  polynomial  (let  us  remember  that  we  consider  poly¬ 
nomials  without  positive  powers).  Assume  that  p  possesses  a  pair  of  symmetric  roots 

{ot/ 2, 7 r  +  a/2}.  The  transfer  from  p(£)  to  the  polynomial  pa(£)  =  ielj-ia  ^  is  said 
to  be  a  transfer  to  the  previous  level  The  inverse  transfer  from  pa  to  p  is  a  transfer 
to  the  next  level  So  a  transfer  to  the  previous  level  reduces  a  pair  of  symmetric  roots 
{a/2, 7 r  +  a/ 2}  to  the  one  root  a . 

Proposition  4.2  Let  a  mask  in  be  obtained  from  a  mask  m  by  a  transfer  to  the  previous 
level  Then  s(rh)  =  s(m).  Moreover ,  v(rh)  =  v(m),  whenever  l(m)  =  1  (m). 

(The  constant  1  (m)  responsible  for  condition  1.2  was  defined  in  Section  1).  This  implies, 
in  particular,  that  the  reduced  equation  [m]  possesses  a  smooth  compactly  supported 
solution  if  and  only  if  the  initial  equation  [m]  does;  and  the  same  true  for  the  convergence 
of  the  corresponding  subdivision  schemes.  Thus,  a  transfer  to  the  next  (previous)  level 
does  not  change  the  smoothness  of  solutions.  It  also  respects  the  rate  of  convergence  of 
subdivision  processes,  unless  this  transfer  does  not  violate  condition  1.2  (a  transfer  to  the 
previous  level  may  increase  the  value  l(m)).  Using  this  Proposition  one  can  consequently 
eliminate  all  symmetric  roots  of  a  given  mask. 

4.2  Elimination  of  regular  cycles 

Let  a  polynomial  p  possess  a  cycle  b.  The  transfer  from  p(£)  to  the  polynomial  p(£)  = 
p( 0/  rW(e”*  +  e~t(3)  is  called  an  eliminating  of  a  cycle. 

Proposition  4.3  Let  a  mask  in  be  obtained  from  a  mask  m  by  eliminating  of  a  cycle 
b.  Then  s(m)  =  s(m)  and  v{m)  —  max{i/(7n),  pm(b)}. 

Thus  the  equation  [m]  possesses  a  smooth  compactly  supported  solution  if  and  only  if 
the  equation  [rh]  does.  Moreover,  the  process  {m}  converges  in  Cl  if  and  only  if  the 
process  {m}  does  and  in  addition  pm(b)  <  2~l. 

See  [11]  for  the  proofs  of  Propositions  4.2  and  4.3.  Now  it  becomes  clear  how  to  estab¬ 
lish  Theorem  2.2.  First  we  consequently  eliminate  all  symmetric  roots.  By  Proposition 

4.2  it  does  not  change  neither  the  smoothness  of  solution  nor  the  rate  of  convergence  (if 
the  initial  mask  satisfied  condition  1.2).  Moreover,  by  Lemma  4.1  this  process  respects 
the  constants  pm(b)  for  all  cyclic  sets  b.  The  final  mask  has  no  symmetric  roots,  hence 
it  can  have  only  regular  cycles.  Then  we  eliminate  all  regular  cycles  (refereeing  to  Pro¬ 
position  4.2)  and  obtain  a  mask  satisfying  Cohen’s  criterion,  whose  subdivision  process 
does  converge.  This  line  of  reasoning  also  allow  us  to  eliminate  directly  all  generalized 
cycles  as  follows. 

4.3  Eliminating  of  generalized  cycles 

Let  a  polynomial  p  possess  a  generalized  cycle  b  with  corresponding  sets  of  zeros 
-4i»  ■  •  •  ,A.  The  transfer  from  p(£)  to  the  polynomial p(f)  =  p(£)/ riae^j=i,...,n(e~^ ~ 
e~ia)  is  called  an  eliminating  of  a  generalized  cycle . 
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Proposition  4.4  Let  a  mask  m  be  obtained  from  a  mask  m  by  eliminating  of  a  gener¬ 
alized  cycle  b.  Then  s(m)  =  s(m)  and  v(m)  =  rnax{iv(m),  pm(b)}. 

Proof:  After  a  suitable  sequence  of  transfers  to  the  previous  level  all  the  sets  of  zeros 
Ai , . . . ,  An  drop  to  the  corresponding  roots  ft  +  7r, . . . ,  ft  +  7T,  and  b  becomes  a  regular 
cycle.  By  Lemma  4.1  this  does  not  change  the  value  pm(b).  Now  it  remains  to  apply 
Proposition  4.3.  D 

Example  4.5  Consider  again  the  mask  m(£)  from  Example  2.3.  After  eliminating  the 
generalized  cycle  b  =  {^,  4^}  we  obtain  the  mask  m(£)  =  0.2  +  0.5e“**  +  0.3e~2ft 
Since  all  the  coefficients  of  m  are  positive,  it  follows  that  the  equation  [m]  has  a  Co¬ 
solution  and,  moreover,  the  corresponding  subdivision  process  {m}  converges  (see,  for 
instance  [1]).  Now  applying  Proposition  4.4  we  see  that  the  initial  process  {m}  diverges, 
since  pm(b)  =  \/l.l2.  Let  us  note,  that  the  matrix  B  corresponding  to  the  mask  m 
(. B  —  {c2i-j}i,je{ o,...,8})  has  the  eigenvalue  1  with  multiplicity  one  and  has  no  other 
eigenvalues  on  the  unit  circle.  So  the  divergence  of  the  subdivision  scheme  in  this  case 
does  not  follow  from  the  well-known  argument  of  multiple  eigenvalues. 

5  Unimprovability  of  criterion.  Examples  of  divergent  schemes 

Now  we  are  going  to  see  that  Theorem  2.2  gives  a  full  description  of  divergent  subdivision 
schemes  having  smooth  refinable  functions.  This  means  that  all  possible  cases  of  the 
criterion  of  convergence  are  realized  on  suitable  masks.  For  the  sake  of  simplicity  we 
formulate  this  result  for  the  convergence  in  the  space  C,  i.e.,  for  the  case  l  =  0. 
Theorem  5.1  Let  b  =  {ft,..., ft}  be  a  cyclic  set  and  let  Au...yAn  be  arbitrary 
minimal  cut  sets  of  the  trees  7^1+7r, . . . ,  7ft +7r  . respectively .  Then  there  exists  a  mask 
m(£)  such  that 

1)  m(Aj)  =  0,  j  =  1, . . .  ,n,  i.e.,  b  is  a  generalized  cycle  of  the  mask  m,  and  Aj  are 
its  sets  of  zeros; 

2)  the  equation  [m]  has  a  Co-solution,  but  the  subdivision  process  {m}  does  not  con¬ 
verge  inC; 

3)  after  eliminating  of  the  generalized  cycle  b  this  process  becomes  converging  in  C. 
Proof:  Consider  a  mask  p(£)  —  (1  -j-  e-t^)/2a(£)  such  that  deg  a  >  2,  and  the  subdivi¬ 
sion  process  {p}  converges  in  C.  To  obtain  such  a  mask  it  suffices  to  take  an  arbitrary 
polynomial  a(£)  with  positive  coefficients  such  that,  a(0)  —  1.  Now  we  use  the  fact  that  if 
the  process  {p}  converges  in  C,  then  it  will  still  converge  in  this  space  after  all  sufficiently 
small  perturbations  of  the  coefficients  of  a(£)  preserving  the  condition  a(0)  =  1  (see  [3]). 
Thus,  with  possible  perturbation  of  the  coefficients,  we  assume  that  the  trigonometric 
polynomial  a  has  no  real  roots  and  that  the  value  pa(b)  is  irrational.  Such  a  perturb¬ 
ation  exists  by  the  mean  value  theorem,  because  pa(b)  is  a  continuous  function  of  the 
coefficients  of  a(£).  This  implies,  in  particular,  that  pa(b)  >  0  and  hence  pp( b)  >  0. 
Now  take  the  polynomial  q(£)  =  YlaeAj,j=i,...,n(e~i^  ~  e~ia )•  Lemma  4-1  we  have 
ppqr(b)  =  2rpp(b)  for  every  r  >  0.  Consequently  there  exists  a  nonnegative  integer  r 
such  that  ppqr( b)  >  1.  Take  the  smallest  such  integer  r0  and  denote  a  =  aqr°~~l  and 
p  =  pqr°~ 1  (if  r0  =  0,  then  we  put  a  =  a,p  =  p).  Let  us  remark  that  the  case  pp( b)  =  1 
is  impossible,  because  this  value  is  not  rational,  therefore  pp( b)  <  1.  Since  b  is  the  only 


401 


Subdivision  processes  and  refinement  equations 

generalized  cycle  of  the  polynomial  p,  therefore,  by  Proposition  4.4,  the  subdivision  pro¬ 
cess  {p}  converges.  Now  make  a  small  perturbation  of  the  coefficients  of  the  polynomial 
a  after  which  the  process  {p}  still  converges,  and  the  value  ppq( b)  is  still  bigger  than  1, 
but  the  polynomial  a  does  not  have  real  roots.  Then  denote  fh  =  p,m  =  fhq.  We  see  that 
the  mask  m  has  a  unique  generalized  cycle  b,  and  this  cycle  has  sets  of  zeros  A\,  . . . ,  An. 
Since  pm(b)  >  1,  the  process  {m}  diverges,  however  removing  this  generalized  cycle  we 
obtain  the  converging  process  {m}.  This  proves  the  theorem.  □ 
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Abstract 

In  previous  work  we  introduced  a  method  of  using  polynomial  splines  with  appropriate 
discontinuities  to  approximate  a  piecewise  smooth  function  /  with  jump  discontinuities  of 
/  and  ? .  The  information  used  is  location  of  discontinuities,  and  low  order,  possibly  noisy 
Fourier  coefficients.  The  number  of  discontinuities  was  limited  to  two  at  most,  and  the 
discontinuities  needed  to  lie  at  meshpoints  in  a  uniform  mesh.  We  showed  that  the  linear 
operator  corresponding  to  the  method  is  /^-bounded  with  a  modest  bound,  and  thus 
that  the  method  is  1,2-robust  in  the  presence  of  noise.  In  the  present  paper  we  develop 
a  new  method  of  analysis  which  enables  us  to  determine  operator  bounds  that  are  valid 
for  arbitrarily  many  discontinuities.  The  new  analysis  allows  discontinuities  to  be  placed 
arbitrarily.  Given  a  placement,  an  initially  uniform  spline  mesh  of  width  h  must  be  used 
such  that  nearest  meshpoints  to  discontinuities  are  at  least  4 h  apart  (discontinuities  then 
replace  these  meshpoints);  the  number  of  available  Fourier  coefficients  must  be  at  least 
three  times  the  number  of  mesh  intervals  in  a  period.  The  previous  work  was  restricted 
to  quadratic  splines;  the  present  work  includes  cubic  splines.  Much  of  the  analysis  uses 
exact  computations  with  a  computer  algebra  system.  We  give  an  example  to  illustrate 
the  accuracy  of  the  method  using  noisy  Fourier  coefficients. 

1  Introduction 

We  consider  approximating  a  function  /  when  the  information  consists  of  low  order,  pos¬ 
sibly  noisy  Fourier  coefficients,  and  knowledge  that  /  is  smooth  except  for  jumps  of  /  or 
/'  at  known  locations  but  unknown  magnitudes.  We  will  work  with  a  method,  introduced 
in  [10],  which  amounts  to  linear  least  squares  fitting  of  the  available  coefficients  with  the 
coefficients  of  splines  with  appropriately  placed  discontinuities.  Since  we  anticipate  ap¬ 
plications  to  ill-posed  problems  where  boundedness  of  the  solution  operator  is  crucial, 
we  develop  a  method  for  bounding  the  norm  of  this  operator.  The  bounding  method 
depends  heavily  on  exact  computations  in  certain  spline  spaces.  These  computations  are 
fundamentally  finite  dimensional  linear  algebra  with  rational  integer  coefficients.  Their 
goal  is  to  develop  upper  bounds  for  the  norms  of  certain  projector  operators  whose 
norms  are  naturally  expressed  in  terms  of  generalized  eigenvalues,  and  to  prove  by  exact 
computation  that  the  bounds  are  correct.  A  computer  algebra  system  is  used  for  the 
computations.  The  programming  is  detailed  in  [9]. 
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In  [10]  we  obtained  bounds  under  much  more  restrictive  conditions  than  in  the  present 
paper.  In  [10]  the  splines  were  quadratic  only,  while  here  results  also  are  given  for  cu¬ 
bic  splines.  The  analysis  in  [10]  required  all  knots  of  the  approximating  splines  to  be 
uniformly  spaced,  and  since  the  discontinuities  are  at  the  knots,  the  location  of  discon¬ 
tinuities  was  limited.  Further,  in  [10]  the  estimation  process  is  linear  in  the  total  number 
of  discontinuities,  and  produces  results  unacceptably  large  for  cases  with  more  than  one 
discontinuity  of  /  and  two  of 

Others  ([2,  3,  4,  5])  have  addressed  questions  of  accurate  approximations  to  functions 
with  discontinuities  given  Fourier  coefficients  as  information.  In  [8]  we  give  examples 
which  show  that  those  methods  can  substantially  magnify  noise  in  the  coefficients;  our 
main  concern  here  is  to  prove  robustness  of  our  method.  We  illustrate  with  an  example 
in  Section  5. 

2  General  linear  space-theoretic  results 

Let  V  be  a  real  Hilbert  space  with  inner  product  {  ,  ).  We  will  denote  the  norm 

associated  with  (  ,  )  by  ||  ||.  Let  V  and  Q  be  closed  subspaces  of  V;  suppose  P  is  the 

orthogonal  projector  on  V.  Here,  as  in  [10],  we  deal  with  the  approximation  /*  obtained 
as  the  solution  to  the  constrained  least  squares  problem 

min  ||P/*  -P/H,  /*€S. 

Assuming  that  P  is  invertible  as  a  mapping  on  Q,  we  denote  by  P+  the  mapping  from 
P(Q )  to  Q  which  inverts  P.  It  is  not  hard  to  verify  that  /*  =  P+RPf  where  R  is  the 
orthogonal  projector  on  P(Q).  Let  A  denote  the  operator  that  takes  /  to  /*. 

Theorem  2.1  Let  C  be  a  mapping  from  V  to  Q.  Let  e  be  T -periodic  and  in  £2(0,  T). 

Then 


\\A(Pf  +  e)  -  f  ||  <  (IIP+H  +  1)|| Cf  -  /||  +  HP+II  Hell- 

Proof:  A(Pf  +  e)  =  Af  +  Ae.  || Af  -  f\\  <  || Af  -  Cf\\  +  || Cf  -  f\\  =  ||A(/  - C/)|| 
+||C7  -  /||  <  (||A||  +  1)||/  -  C/||.  pH  =  ||P+PP||  <  ||P+||  because  P  and  R  are 
orthogonal  projections.  □ 

A  main  objective  of  the  following  work  will  be  to  bound  ||P+||.  This  will  be  done 
by  establishing  upper  bounds  for  \\I  —  P||  as  a  mapping  on  Q.  From  these,  bounds  can 
easily  be  derived  for  ||P+||. 

Theorem  2.2  Let  r\  <  1  exist  such  that  ||(J  —  P)q\\  <  ??||<7H?  for  all  q  G  Q.  Then  P  is 
injective  as  a  mapping  on  Q  and  for  all  h  e  P(Q),  P+,  the  inverse  of  the  restriction  of 
P  to  Q ,  satisfies 


Mi2  <-4-* 
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We  will  obtain  bounds  for  ||7  -  P\\  by  considering  the  projector  perpendicular  to  a 
spline  space  Q  which  is  more  tractable  than  PV,  and  on  which  I  — P  is  small.  In  the  next 
section,  Q  is  the  approximating  spline  space,  S  a  subspace  of  maximally  continuous 
splines,  and  Q  is  a  space  of  maximally  continuous  splines  whose  knots  are  in  a  mesh 
refining  the  mesh  for  the  members  of  S.  S  and  Q  have  orthogonal  projectors  S  and  G, 
respectively.  The  following  estimates  ||7  —  P\\  in  terms  of  ||7  -  G\\. 

Theorem  2.3  Suppose  ||(7-P)<?||  <  rjo\\g\\  for  all g  £  Q.  Suppose  ||(7-GH|  <  rfi\\q\\ 
for  all  q  e  Q.  Then  \\(I  -  P)q\\ <  (rjo  +  r)i)\\q\\  for  all  for  all  q  €  Q. 

Proof:  For  q  e  Q,  |[(7  -  P)q\\  <  || (J  -  P)Gq\\  +  ||(/  -  P)(I  -  G)q\\. 

||(j  „  p>)^||  <  77o||^||  <  r7o|kl|,  and  ||(7  -  i^)(J  -  C7)g||  <  ||(7  -  G)^||  <  mlkll-  ° 

Theorem  2.4  enables  us  to  bound  | \I-G\ |  on  Q  by  instead  bounding  projectors  orthogonal 
to  small  subspaces  of  Q ,  restricted  to  small  subspaces  of  Q. 

Theorem  2.4  Let  Q  and  S  be  closed  subspaces  ofV  with  S  C  Q  n  Q.  Let  V\ ,  V2, . . . ,  Vr 
be  nonzero  mutually  orthogonal  subspaces  of  V.  Let  Qi  C  Qn  Vi,  1  <  i  <  r  be  nonzero 

closed  subspaces  such  that  Q  =  S  -b  Q\  +  Q2  H - b  Qr-  Let  Qi  CQn  Vi,  Hi  C  «SX  n  Vi, 

1  <  i  <  r  be  nonzero  closed  subspaces  with  orthogonal  projectors  Gi,H{.  Let  v  be  a  con¬ 
stant  such  that  ||( J  -  Gi)qi\\2  <  v\\Hiqi\\2  for  all  qi  €  Qi,  1  <  i  <  r.  Then  ||(/  —  G)q\\2  < 
v\\q\\2  for  all  q  €  Q. 

Proof:  q  G  Q  can  be  written  q  =  s  +  v  where  s  £  S  and  v  =  q\  +  92  H - b  qT,  qi  €  Qi, 

1  <  i  <  r.  ||(/  -  G)q\\  =  ||(7  -  <7)v||  since  S  C  Q.  Let  F  =  G\  -f  G2  H - f  Gr.  Since 

'Ci  +  ft  +  -HC0,  IK7-GHI2  <  11(7 -PHI2  =  Ei=ill(^-^t)ftll2»  the  latter 
equality  because  of  orthogonality  of  the  Qi.  \\q\\2  >  ||(7  —  SHI2  >  ||  E<=i  #HI2  = 

I 1 12)  since  and  the  Hi  are  orthogonal.  If  all  Hiqi  =  0  the 

hypothesis  implies  all  (7  -  Gi)qi  =  0.  The  above  then  implies  (7  —  G)q  =  0,  and  the 
conclusion  is  true.  We  proceed  assuming  i7^  /  0  for  some  i  and  let  JV  be  the  set  of  all 
those  i.  Then 

ll(/-G)gll2  ^ 

Ikll2  "  Ei€A'll^9ill2  • 

An  elementary  argument  shows  the  quotient  of  sums  is  <  v  since  for  each  i  £  A f, 
||(7  -  Gi)qi\\2 /\\Hiqi\\2  <  v.  □ 

3  Bounds  for  restricted  projectors 

Below,  we  specialize  the  spaces  of  the  last  section,  and  get  our  main  results.  Let  T  >  0 
be  a  fixed  period.  We  take  V  to  be  the  space  of  real-valued  T-periodic  functions  which 
belong  L2 (f )  for  some,  and  thus  every,  period  interval  7.  On  V  and  its  subspaces  we  define 
the  inner  product  {f,g)  =  fj  f(t)g(t)dt ,  7  a  period  interval.  The  other  realizations  are 
defined  in  the  statements  and  proofs  of  the  following  results.  Lemma  3.1  sets  up  an 
application  of  Theorem  2.4;  Theorem  3.2  uses  this,  together  with  Theorem  2.2,  to  get 
our  main  result. 

Lemma  3.1  Let  X  be  a  finite  set  of  points  in  [0,T).  Let  N  >  4  be  an  integer.  Let 
K  =  { iT/N ,  0  <  i  <  N}:  for  each  x  €  X,  let  kx  be  a  member  of  K  closest  to  x  where 
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0  is  identified  with  T.  Assume  N  large  enough  that  between  any  two  distinct  kx  are  at 
least  three  other  members  of  K.  Let  Kx  result  from  substituting  in  K  each  x  G  X  for  its 
kx.  For  m  =  3,4  let  Q  be  the  space  of  m-th  order  T -periodic  polynomial  splines  with  Kx 
as  knots  and  with  continuity  Cm~2  at  all  knots  except  the  x  €  X,  where  no  continuity 
is  required.  Let  Q  be  the  space  of  m-th  order  periodic  splines  with  knots  in  [0 ,T)  at  the 
points  {iT/(3N),0  <  i  <  3iV},  and  let  G  be  the  orthogonal  projector  on  Q.  Then  I  —  G 
restricted  to  Q  satisfies  \\I  -  G\\2  <  .69  ifm  =  3,  and  ||/  —  C?| ||  <  .9  if  m  —  4. 

Proof:  Let  S  be  the  subspace  of  Q  consisting  of  those  splines  which  are  C°°  at  the 
kx.  Clearly  S  C  Q.  Let  h  =  T/N.  Fix  x  =  Xi  6  X  =  {xi,  x2 , . . .  ,xr]  and  let  y0  =  Xi, 
Va  —  hXi  —  ah,  a  =  —2,— 1,1,2.  Take  V*  to  be  the  subspace  of  V  consisting  of  those 
functions  with  support  in  2,2/2]  and  its  T-translates. 

For  m  =  3  let  ji  and  j2  be  B-splines  with  knots  2/_i,2/o,2/o,2/o  and  y0,y0,yo,yi;  let 
js  be  the  difference  of  the  B-splines  with  knots  y^2,y~i,yo,  2/i  and  2/-i,2/o,2/i,2/2  (see  [1] 
for  explanation  of  multiplicity  versus  degree  of  continuity).  For  m  =  4  let  j\  and  j2  be 
B-splines  with  knots' 2/- 1, 2/0, 2/o,2/o,yo  and  2/o,2/o ,2/o,2/o ,2/1 ;  let  j3  be  the  difference  of  the 
B-splines  with  knots  2/-2,2/-i>yo,2/o,2/i  and  y_i  ,yo 52/i  5^/2;  and  let  be  the  B-spline  with 
knots  2/-2,2/-i,2/o, 2/1 ,2/2 •  Since  y2  —  y~2  <  T  we  may  identify  the  ja  with  their  T-periodic 
extensions. 

Let  Qi  be  the  space  of  splines  whose  generic  member  is  qi  =  Yl™=i  caja  for  constants 
ca ■  For  each  i,  nonzero  members  of  Qi  have  continuity  from  Cm~2  through  full  discon¬ 
tinuity  at  Xi,  while  members  of  S  are  C°°  at  a^.  It  follows  that  <Sn(Qi  +  Q2H - \~Qr)  —  0 

and  Q  =  S  +  Qi~\ - h  Qr 

Let  Qi  be  the  subspace  of  G  with  basis  the  Cm~2  periodic  B-splines  whose  knots 
in  the  period  containing  [y-2,y2]  are  length  m- hi  sublists  of  consecutive  knots  from 
the  list  (a/i/3  +  kx,  —  6  <  a  <  6).  Let  Hi  be  the  space  of  those  m-th  order  periodic 
splines  which  in  [— T/2  +  kx,Tj 2  +  kx\  have  support  in  [2/_2,  ^2] ,  which  have  knots  at 
the  yi,  i  ^  0  and  at  x,  are  Cm~2  at  2/-1  and  y\,  which  may  be  fully  discontinuous  at 
y~ 2,  2/2,  and  x,  and  which  are  orthogonal  to  all  members  of  S.  \\(I - Gi)qi\\2 /\\Hiqi\\2  is  a 
ratio  of  quadratic  forms  in  the  ca.  An  upper  bound  v  for  it  can  be  obtained  as  an  upper 
bound  for  the  eigenvalues  of  the  pencil  A  -  A B  where  aa/3  =  ((/  -  Gi)ja,  (I  -  Gi)jp), 
bat 3  =  iHdon Hijp), 

In  [9]  explicit  bases  for  the  spaces  Qi  and  Hi  are  calculated  as  m-th  order  splines. 
From  their  definitions  ([1]),  B-splines  are  rational  functions  of  the  knots,  and  thus  are 
also  inner  products  of  B-splines.  The  null-basis  and  orthogonal  projection  calculations 
in  [9]  use  standard  methods  which  involve  only  rational  operations.  Thus  the  (I  —  Gi)  ja 
and  Hija  and  then  the  aap  and  are  rational  functions  of  the  knots  of  qi,  so  long 
as  x  remains  in  [kx,  kx  +  hj 3].  When  x  crosses  into  [kx  +  h/3,kx  +  hj 2],  thus  crossing 
knots  for  splines  in  G{ ,  the  rational  functions  change,  so  in  general  the  matrix  entries 
are  piecewise  rational  functions  of  x. 

Let  v  be  a  conjectured  upper  bound  for  the  maximum  eigenvalue  Xmax  of  A  —  A B  (in 
[9]  a  floating  point  approximation  to  A  max  is  plotted  as  a  function  of  z;  v 4s  determined 
from  inspecting  this  plot).  For  computational  convenience  in  [9]  we  represent  x  as  2e/i/3+ 
kx,  0  <  e  <  1/2  for  x  <  kx-kh/ 3,  and  as  (l  +  e)h/3  +  kx,  0  <  e  <  1/2  for  kx  +  h/3  <x< 
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kx  +  h/2.  For  further  convenience  we  take  kx  =  0,  clearly  losing  no  generality.  We  have 
represented  only  x  >  kx,  but  because  of  symmetry,  x  <kx  produces  the  same  bounds. 

Since  h  is  a  linear  factor  in  all  knots  in  the  calculation,  we  see  that  aap  and  ba p 
can  be  written  as  h  multiplying  piecewise  rational  functions  of  e  (with  integer  rational 
coefficients).  The  determinant  of  A  -  vB  is  thus  hm  times  a  piecewise  rational  function 
of  e.  The  MAXRAT  algorithm  ([9])  proves  that  its  reciprocal  is  bounded  as  a  function 
of  e  in  the  appropriate  ranges,  so  the  determinant  itself  is  bounded  away  from  0.  In  [9], 
e  is  then  set  equal  to  0  in  A  —  rB ,  and  the  determinant  of  that  matrix  is  then  shown 
to  have  m  sign  changes  as  r  decreases  from  v.  Thus  the  conjectured  value  v  bounds  all 
eigenvalues  of  A  -  \B  for  all  values  of  x.  The  upper  bounds  thus  obtained  are  v  —  .69 
for  m  —  3  and  v  —  .9  for  m  =  4.  We  emphasize  that  the  B-splines,  matrix  entries,  and 
determinants  all  are  calculated  exactly,  using  the  Maple  ([6,  7])  computer  algebra  system, 
so  the  bounding  property  of  v  is  rigorously  proven.  Since  the  bounds  we  obtain  apply  to 
the  spaces  Gi  and  B{  associated  with  any  one  of  the  they  satisfy  the  hypotheses  of 
Theorem  2.4  which  now  provides  our  conclusions.  □ 

Our  main  result  now  follows. 

Theorem  3.2  Let  the  hypotheses  be  those  of  Lemma  3.1.  In  addition,  let  P  be  the 
orthogonal  projector  onto  the  space  of  n-th  order  real-valued  T -periodic  trigonometric 
polynomials,  where  n  >  3 N.  If  m  =  3,  we  have  ||P+||2  <  2.4,  while  if  m  —  4,  we  have 
l|P+l|2<4.5. 

Proof:  The  space  Q  in  Lemma  3.1  consists  of  periodic  splines  with  uniformly  spaced 
knots.  Theorem  3.1  of  [10]  implies  that  ||J  -  P||2  <  (a/(  1  +  a))1/2  where 

oo 

a  =  4£(l/(l  +  2r))2m. 

T— 1 

In  [9]  we  use  this  formula  to  get  upper  bounds  of  .076  when  m  —  3  and  .025  when  m  =  4 
for  || J  -  JP| |2.  Taking  these  bounds  as  rj 0  in  Theorem  2.3  and  taking  the  bounds  from 
Lemma  3.1  as  771  in  Theorem  2.3,  we  obtain  from  that  theorem  bounds  for  ||J  —  P ||2  of 
.907  for  m  =  3  and  .974  for  m  =  4.  Theorem  2.2  now  applies  to  produce  the  present 
results.  □ 

Above,  we  required  n  >  3 N ;  under  this  condition  we  can  get  our  simplest  and  most 
comprehensive  results.  Since  we  contemplate  applying  our  results  where  the  number  n 
of  useful  coefficients  may  be  limited,  we  have  tried  to  get  versions  of  Theorem  3.2  where 
n  is  smaller  compared  with  N.  We  have  no  useful  versions  for  n  <  3 N  and  m  =  4  (cubic 
splines).  The  following  result  for  quadratic  splines  may  be  useful.  To  formulate  it,  let 
d  =  max{|ir  -  kx\N/T}.  In  the  previous  results,  the  separation  of  the  values  x  from 
their  nearest  uniform  mesh  points  kx  was  unrestricted,  which  corresponds  to  61  =  1/2. 
Here,  we  can  get  results  for  quadratic  splines,  and  n  >  2N,  provided  the  x  are  more 
restricted;  our  methods  of  analysis  “blow  up”  for  n  >  2N  as  ci  approaches  a  number 
slightly  larger  than  .25. 
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Theorem  3.3  Letm  —  3  (quadratic  splines);  letn  >  2 N.  Otherwise,  let  the  hypotheses 
be  those  of  Theorem  3.2.  Corresponding  to  the  list  0,  .1,  .2,  .25  for  values  of  e i,  we  have 
the  list  of  values  1.7, 2.1, 3.9, 16  as  bounds  for  ||P+||. 

Proof:  For  each  of  the  cases  for  e,  an  argument  similar  to  the  proof  of  Lemma  3.1 
applies  to  produce  a  bound  771  for  \\I  —  G\\2  where  Q  now  is  defined  using  the  uniform 
knot  spacing  1/(2N)  rather  than  l/(3iV).  The  only  difference  in  the  argument  is  that 
here,  a  discontinuity  location  x  always  stays  in  the  interval  [kx,  kx+eih\  where  h  —  T/N , 
so  the  matrix  entries  and  determinants  can  be  treated  as  functions  of  e  in  [0,  €1].  Each 
bound  rji  now  is  used  just  as  in  the  proof  of  Theorem  3.2,  to  get  the  present  bounds  for 

\\p+  lb-  □ 

4  Uniform  norm  bounds 

Using  representers  of  point  evaluation,  as  in  [8],  we  can  get  uniform  norm  bounds  for  P+, 
and  thus  for  A.  The  arguments  are  similar  to  those  in  [8].  The  main  difference  is  that 
there  the  mesh  is  uniform  and  the  order  m  is  3.  The  constructions  of  representers  extend 
fairly  easily  to  the  present  case:  here  the  norms  of  representers  are  functions  both  of  the 
evaluation  point  and  the  location  of  the  discontinuity  nearest  to  the  evaluation  point. 

One  can  show  that  for  each  point  t  €  [0,  T),  a  spline  rt  exists  in  a  space  U  containing  Q, 
such  that  ( rt ,  q)  ~  q(t)  for  each  q  G  Q,  and  such  that  ||rt||2  <  k/y/h  where  k  —  5,  m  —  3 
and  k  =  7  ,m  =  4;  h  =  T/N  as  before.  The  computations  for  the  construction  and  bound 
calculations  are  in  [9] .  Noting  that  y/T /  y/h  =  y/~N ,  we  have 

UfWoo  <  max||rt||2||yl/||2  <  (k/VhW+lhVTwfW^  <  fc^HP+lbll/lloo. 

When  N  <  100  and  the  hypotheses  are  those  of  Lemma  3.1,  this  gives  HA/Hoo  < 
120||/||oo  for  m  -  3,  and  \\Af\\^  <  315||/||oo  for  m  =  4. 

5  Example 

Fig.  1; 

f-Af,  no  noise  f  -  Af ,  1  %  noise  exact  f 
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We  illustrate  the  method  using  an  example  where  the  function  /  is  27r-periodic  and  on 
[0, 27r)  consists  of  the  function  e"x/6  with  a  piecewise  quadratic  added,  so  as  to  produce 
discontinuities  at  0,  .5, 1.5, 2.5,  and  4.  /  is  a  modification  of  an  example  in  [2];  for  conveni¬ 
ence  we  have  shifted  that  example  left  by  1  unit,  and  we  have  added  the  exponential  term 
because  our  method  can  represent  a  piecewise  quadratic  exactly  in  the  absence  of  noise. 
Exact  (up  to  17-decimal  digit  floating  point  error)  Fourier  coefficients  are  derived  from 
/  by  exact  integration  using  the  Maple  ([6,  7])  system.  Noisy  approximate  coefficients 
are  also  derived  by  sampling  /  at  1024  equidistant  values  in  [0,27r],  adding  uniformly 
distributed  pseudo-random  noise  to  the  samples,  and  taking  the  discrete  Fourier  trans¬ 
form  of  the  samples.  In  effect,  we  work  with  /  - he  where  e  is  a  perturbing  function. 
The  level  of  the  noise  is  set  so  that  the  discrete  Z/2-norm  of  the  noise  vector  is  1%  of 
the  discrete  Z/2-norm  of  the  vector  of  samples  of  /.  N  =  45  and  thus  n  =  135  are  the 
smallest  values  of  n  and  N  for  which  the  hypotheses  of  the  previous  section  are  satisfied. 
Using  these  values,  we  proceed  with  m  —  4  (cubic  splines)  for  each  of  these  cases  for 
Fourier  coefficients.  Plots  of  /  and  of  the  error  for  the  two  cases  appear  in  the  figure. 
The  ratio  ||/-A(/  +  e)||2/||/||2  is  about  .005  for  the  case  of  1%  noise.  In  [9]  we  develop 
a  probabilistic  estimate  of  .0037  for  the  ratio  of  ||c||2/||/||2*  This  estimate  indicates  an 
L2-norm  noise  magnification  of  about  1.35-fold,  compared  with  the  upper  bound  of  4.5 
given  in  Theorem  3.2.  The  uniform  error,  for  noise-free  coefficients,  is  about  10“9;  com¬ 
putational  experiments  show  this  is  dominated  by  truncation  error  in  approximating  the 
exponential  term.  In  [9]  we  do  the  corresponding  calculations  for  m  =  3,  and  find  similar 
results  for  1%  noise,  with  larger,  but  still  small,  error  for  noise-free  coefficients. 

In  [9],  we  implement  Eckhoff’s  method  as  described  in  [3],  used  on  the  above  data. 
For  noiseless  data,  the  results  are  comparable  to  those  reported  by  Eckhoff  for  similar 
examples.  The  uniform  norm  error  seems  to  be  about  .06,  with  errors  at  jumps  somewhat 
smaller.  For  1%  noise,  the  results  of  Eckhoff’s  method  are  about  750-fold  in  error. 
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Abstract 

The  response  of  a  unity-feedback  system  with  a  delay  element  in  the  forward  path  exhibits 
a  periodic  component  that  can  be  approximated  by  truncating  its  harmonic  expansion. 
Rational  approximants  of  the  transfer  function  e~^s  of  such  element  can  simply  be  ob¬ 
tained  from  this  closed-loop  approximation.  A  unifying  approach  to  recent  methods  based 
on  this  criterion  [2,  3]  is  presented,  which  allows  us  to  point  out  their  respective  features. 
The  standard  Pade  technique  and  a  heuristic  method  described  in  [5]  are  also  considered. 

1  Introduction  and  problem  statement 

In  modelling  dynamic  systems  for  control  purposes,  it  is  often  necessary  to  account  for 
time  delays  due,  e.g.,  to  transport  phenomena  or  distributed-parameter  components. 

The  response  of  an  ideal  delay  element  (delayor)  to  an  input  u(t),  identically  equal  to 
0  for  t  <  0,  is  y(t)  —  u(t~~T ),  T  >  0,  where  T  indicates  the  time  delay.  By  denoting  with 
U(s)  the  Laplace  transform  of  u(t),  the  Laplace  transform  of  y(t)  is  Y(s)  =  e~TsU(s). 
Therefore  the  transfer  function  of  the  delayor  is  the  transcendental  function  e~Ts. 

The  problem  of  approximating  e~Ts  by  means  of  a  rational  function  has  a  long  history 
(see,  e.g,,  [1])  but  is  still  important  from  both  the  computational  and  the  conceptual 
point  of  view;  a  few  recent  contributions  on  the  subject  are  quoted  in  [2].  In  many 
practical  applications  the  physical  realizability  and  the  stability  of  the  approximant 
limit  the  choice  of  the  approximant  to  proper  rational  functions  with  real  coefficients 
and  a  Hurwitz  denominator.  These  requirements  are  satisfied  by  Blaschke  products,  i.e., 
functions  of  the  form: 

Re  [a*]  >  0  .  (1.1) 


£>{S)  —  y-WJ  /  \ 

lii=l(5  +  ai) 
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This  has  the  desirable  property  that  \B(ju)\  =  \e~jTuJ\  =  1,  Vcj,  and  arg[B(jo>)]  is 
monotonically  decreasing  with  oj  like  arg[e~jTw]  =  —To;.  On  the  other  hand,  the  step 
response  of  a  system  with  transfer  function  B(s)  starts  from  +1  or  —1,' whereas  the  step 
response  of  an  ideal  delayor  obviously  starts  from  0. 

The  most  widely  adopted  method  to  form  a  rational  approximant  of  a  delay  element  is 
based  on  the  Pade  technique  which  does  not  always  guarantee  stability  (even  if  biproper 
Pade  models  are  necessarily  stable).  Since  such  a  technique  leads  to  the  retention  of  the 
first  Maclaurin  expansion  coefficients  of  e“Ts,  the  resulting  approximation  is  the  best  in 
the  neighbourhood  of  uj  =  0.  In  different  frequency  bands,  other  types  of  models  may  be 
preferred. 

In  [3]  a  unity-feedback  system  whose  forward  path  consists  of  a  delayor  is  analysed. 

In  the  case  of  negative  feedback,  the  unit  step  response  is  a  piecewise  constant  function 
taking  on  the  value  0  for  2 kT  <  t  <  (2k  +  1)T  and  the  value  1  for  (2 k  +  1)T  <  t  < 
(2 k  +  2)T,  k  >  0,  which  can  be  decomposed  into  a  step  of  amplitude  and  a  square 
wave  of  amplitude  \  starting  from  —  \  at  t  =  0. 

In  the  case  of  positive  feedback,  similar  considerations  allow  us  to  decompose  the  unit 
step  response  into  a  linear  ramp  of  slope  a  step  of  amplitude  —  and  a  saw-tooth 
wave  that  linearly  decreases  from  |  to  —  \  in  every  period  from  kT  to  ( k  4-  1)T. 

In  both  cases,  the  periodic  component  can  easily  be  expressed  as  a  series  of  har¬ 
monic  terms  (for  t  >  0).  It  is  therefore  natural  to  approximate  the  step  response  of  the 
unity-feedback  system  by  retaining  the  non-periodic  component  together  with  a  suitable 
number  of  the  first  harmonics  of  the  periodic  component. 

A  rational  approximation  Wa(s)  of  the  transcendental  transfer  function  kP(s)  of  the 
above-mentioned  feedback  system  is  obtained  by  dividing  the  Laplace  transform  of  the 
approximate  step  response  by  the  Laplace  transform  J  of  the  step  input.  The  rational 
approximant  Ga(s)  of  the  delayor  transfer  function  is  then  determined  as 


Ga(s)  = 


Wa(s) 
IT  Wa(s) 


(1.2) 


where  the  minus  sign  applies  to  the  case  of  negative  feedback  and  the  plus  sign  to  that 
of  positive  feedback.  It  turns  out  [3]  that  Ga(s)  is  a  stable  biproper  rational  function 
having  the  form  of  a  Blaschke  product;  precisely,  negative  feedback  supplies  even-order 
approximants  and  positive  feedback  produces  odd-order  approximants. 

Obviously,  the  same  result  could  be  achieved  by  referring  to  different  inputs  (even 
an  impulse),  but  the  choice  of  the  unit  step  is  particularly  convenient.  According  to 
the  terminology  suggested  in  [4] ,  the  rationale  of  such  a  procedure  consists  in  retaining 
the  “input  component”  (and  the  “resonant  component” ,  if  any)  and  in  truncating  the 
periodic  “system  component”  of  the  response. 

In  [2]  a  feedback  structure  is  used  as  well,  but  another  approximation  criterion  is 
adopted,  which  leads  to  different  models  depending  on  the  chosen  input.  In  particular, 
the  family  of  inputs  considered  in  [2]  is  {u(t)  =  tm  ,m  €  N  ,  t  >  0},  and  the  procedure 
exploits  several  properties  of  Bernoulli  numbers  and  polynomials. 

In  the  following,  the  above  approaches  are  presented  in  a  unified  form  which  allows 
us  to  point  out  their  respective  features  and  to  derive  the  related  approximants  in  an 
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easier  way.  Finally,  criteria  are  given  to  choose  the  approximation  that  is  most  suited  to 
the  application  at  hand,  also  taking  into  account  the  standard  Pade  approximation  and 
a  further  approximation  presented  in  [5]. 

2  Derivation  of  the  approximant 

For  the  sake  of  simplicity,  we  shall  almost  exclusively  refer  to  the  case  of  negative  feed¬ 
back;  only  a  brief  mention  will  be  made  of  the  case  of  positive  feedback. 


2.1  Negative  feedback 

The  transfer  function  W(s)  of  the  negative  feedback  system  with  forward-path  transfer 
function  G(s)  =  e~Ts  is 

<2I> 

whose  singularities  (poles)  are  the  roots  of  eTs  =  -1 ,  i.e. 

,  s  =  ±jpk  :=  ±j(2k  -  1)“,  k  €  Z+  . 

TF(s)  can  also  be  interpreted  as  the  Laplace  transform  of  the  sequence  of  positive  and 
negative  impulses  forming  the  derivative  of  the  step  response  described  in  the  intro¬ 
duction.  Therefore,  it  is  the  sum  of  a  constant  equal  to  |  (corresponding  to  the  step 
component  in  the  just-mentioned  step  response)  and  a  series  of  “harmonic”  terms  asso¬ 
ciated  with  the  above  poles: 

ff(s)  =  !+E[— +4-] . 

2  s~3Pk  $  +  3Pk . 

where  the  bar  denotes  conjugate  and,  using  the  standard  formula  for  the  residues, 


It  follows  that 


rk  =  lim  (s  -  jpk)W(s)  = 

s~*3Pk 


+ (2/c-l)2f! 


In  order  to  compare  the  results  in  [2]  and  [3] ,  let  us  consider  a  canonical  input  of  the 
form 

Ui(t)  =  ,t>  0,  (2.3) 


whose  Laplace  transform  is 


Ui(s)  = 

sl 


(In  [3]  only  the  case  of  i  =  1  is  considered,  whereas  the  inputs  used  in  [2]  differ  from 
(2.3)  by  a  scaling  factor  which  is  irrelevant  for  the  following  considerations.) 
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On  the  basis  of  (2.2)  the  Laplace  transform  of  the  (forced)  response  to  (2.3). 


Yi(s)  =  -  W(s) , 


can  be  rewritten  as 


where  for  i  even, 


and  for  i  odd, 


&ki  —  0  j  fiki  —  (  1)  2 


T  L(2/c-1)tt 


i  2  'j1' 

«fc<  =  (- 1)  2  y  _(2fe  —  1)tt 


A<  =  0 


(2.4w) 


Therefore,  IT (s)  can  also  be  presented  in  the  alternative  form 


Tjr//  ^i(5)  h  ,  aki$Z  +  PkiS%+l  /r>  ^ 

S>  +£^+(a+D^-  (2'5> 

Each  term  of  the  series  in  (2.5)  is  given  by  the  sum  of  a  polynomial  of  degree  i  —  1 
(quotient  of  the  division  of  its  numerator  by  its  denominator)  and  a  strictly  proper 
rational  function  (whose  numerator  is  the  remainder  of  the  division).  Therefore,  (2.5) 
becomes 

i—  1  oo  ( i—  1  j-  1 

irr/.N  P  .  i  ,  r  I  r  .,  _h  ,  Iki+OkiS  I  ,„~s 


W(s)  =  E^+E  {'Ld*iMh+ 


k=l 


which  can  be  rewritten  as 


^)=E  c'“+E^  «h+E 


s2  +  (2fc-l)2f| 


T/ci  4"  djziS 


s2  +  (2A:  —  1)2^J 


By  comparing  (2.7)  with  (2.2),  one  finds  that 


00  i 

E  dki,o  =  7;  > 


Ch  +  ^d*;j,h  =  0,  h>0,  (2.9) 

fc=i 

7m  =  0,  V  fc,  i, 

=  —  ,  V/c,2  . 

The  procedure  suggested  in  [3]  could  alternatively  be  presented  with  reference  to 
expression  (2.7)  where  coefficients  related  to  the  specific  input  appear.  Precisely,  the 
approximant  Wa(s)  is  obtained  in  this  case  by  adding  to  the  exact  value  |  of  the  first 
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sum  (cf.  (2.8)  and  (2.9))  the  first  K  (harmonic)  terms  of  the  second  summation 


w.W 


„  K 

2  s 

T£^s*  +  (2k-l)*£ 


which  is  independent  of  the  input  Uj(t). 

The  procedure  suggested  in  [2]  refers  instead  to  expressions  (2.5)  or  (2.6),  and  the 
approximation  consists  in  truncating  the  summation  over  k ,  where  each  addendum  is 
formed  by  a  polynomial  and  a  strictly  proper  harmonic  term.  Therefore  the  resulting 
Wa(s)  is 


i- 1 


K 


Wa(S)  =  I>Sh  +  £ 


&ki  H"  ftki& 


0 


'  s2  +  (2A:  -  1) 


2  ft 
T2 


(2.10) 


which  does  depend  on  i  and  it  is  not  proper  because  the  part  added  to  the  harmonic 
terms  does  not  reduce  to  the  constant  as  is  instead  the  case  in  JU(s).  Nevertheless, 
the  approximant  Ga(s)  =  Wa($)/(  1  -  IUa(s))  of  e~~Ts  turns  out  to  be  biproper. 

As  concerns  the  computation  of  the  above  approximant s,  the  suggested  approach 
seems  to  be  preferable  to  that  adopted  in  [2]  because 
(i)  coefficients  c/,.,  which  correspond  to  the  first  i  Maclaurin  expansion  coefficients  of 


W(s)  = 


1 


oo 

1+E 


h- 0 


(Ts)h 

hi 


5 


can  be  easily  be  evaluated  using  the  classic  Pade  procedure,  and 
(ii)  formulae  (2.4?)  and  (2.4n)  immediately  supply  coefficients  a^;, 

2.2  Positive  feedback 


Considerations  analogous  with  those  of  Section  2.1  lead  to  the  following  transfer  function 
in  the  case  of  positive  feedback 


TIT-/  \  1  1  2  V— V 

(s)  —  “  o  +  T 


k=  1 


+m 


2  » 


(2.11) 


so  that  li(s)  =  W(s)Ui(s)  can  be  separated  into  a  (harmonic)  series  associated  with  the 
imaginary  conjugate  poles  of  iy(s)  and  a  strictly  proper  fraction  with  denominator  s1*1. 
Using  the  terminology  in  [4],  the  mentioned  series  corresponds  to  the  “system  compon¬ 
ent”  of  the  forced  response  and  the  fraction  corresponds  to  its  “interaction  component” 
because  the  poles  of  the  latter  are  common  to  J¥(s)  and  Ui(s)  (no  “input  component” 
is  present  in  this  case  since  Ui(s)  does  not  exhibit  poles  different  from  those  of  !U(s)). 

As  shown  in  [3],  the  truncation  of  the  series  in  (2.2)  results  in  even-order  biproper 
approximants  Ga(s),  whereas  the  truncation  of  the  series  in  (2.11)  results  in  odd-order 
biproper  approximants  Ga{$)- 

Instead,  as  shown  in  [2],  truncating  the  series  in  (2.5)  leads  to  odd-order  approxim¬ 
ants,  whereas  truncating  the  analogous  series  corresponding  to  positive  feedback  leads 
to  even-order  approximants. 
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2.3  Stability  and  approximation  error 

It  has  been  proved  [3]  that  the  even-order  rational  approximations  Ga(s)  of  e~Ts  obtained 
from  (2.1),  as  well  as  the  odd-order  ones  obtained  by  truncating  (2.11),  are  stable. 
Instead,  as  explicitly  stated  in  [2]  for  inputs  £m,  m  >  2  (i.e.,  using  the  previous  notation, 

Ui  (t)  with  i  >  3)  the  “alternating  sign  of  the  Bernoulli  numbers  makes  the  approximation 
in  general  unstable  [...].  Hence,  from  a  practical  point  of  view,  any  improvement  with 
respect  to  the  approximants  obtained  in  [3]  is  to  be  found  with  p  —  1”,  i.e.,  i  =  2, 3. 

The  approximation  accuracy  can  be  evaluated  by  referring,  e.g.,  to  the  “closed-loop 
error” 

E(s):=W(s)~Wa(s). 

From  (2.1)  we  get 

2  °° 

1  k  K+i  pk 

whereas  from  (2.10)  we  have 

i—l  oo 

E(s)  =  E2(s)  :=  £  J2  d^sh  +  £i(s) ' 

h=0  k=K+I 

Since  E(s)  is  a  complex  quantity,  |#2(s) |  may  well  be  smaller  than  |Fa(s)|  for  certain 
values  of  s  (or  ju). 


3  Alternative  approximants 


As  already  pointed  out,  the  procedure  suggested  in  [2]  leads  to  approximants  that  depend 
on  the  chosen  canonical  input.  To  improve  the  approximation  within  suitable  frequency 
bands  not  centred  at  the  origin,  it  is  reasonable  to  resort  to  non-canonical  inputs  whose 
spectrum  has  larger  amplitude  there.  A  simple  choice  corresponds,  e.g.,  to 

U(s)  =  - - — 


l  +  2£ 


s 2 
4 - o 


in  which  un  is  at  the  centre  of  the  band  and  £  is  suitably  small. 

The  choice  of  the  form  of  the  input  (as  well  as  the  order  of  the  canonical  input)  is 
somewhat  arbitrary  and  is  influenced,  in  practice,  by  empiric  considerations.  Therefore, 
it  makes  sense  to  compare  the  results  of  the  above  procedures  with  those  obtained  in 
[5]  using  a  heuristic  procedure  based  on  the  direct  approximation  of  the  phase  Bode 
diagram  of  e~jTuJ  by  means  of  a  Blaschke  product  Bn(juj)  of  order  n.  For  n  odd,  the 
first  factor  of  Bn(s)  has  the  form 


Gi(s)  - 


1  —  rs 
1  +  TS  ’ 


T  >  0, 


and  the  others  have  the  form 


Gi(s)  = 


s  s 

1  —  2£* - 1 — 2~ 

Uni 


.  S  S 

1  +  2  & - 1 — 2 


ILL 
2  » 


1  >  >  0  j  b)ni  ^  0  j 


(3-1) 


whereas  for  n  even  all  factors  have  form  (3.1). 

All  the  considered  techniques  produce  unit-magnitude  all-pass  frequency  responses  so 
that  the  approximation  they  afford  can  be  judged  with  reference  to  the  phase  deviation 
A  (jo;)  from  -Tu  only.  As  u  — >  oo,  A  (ju)  — >  oo  in  all  cases.  Therefore,  reasonable  criteria 
for  choosing  the  method  most  suited  to  the  specific  application  are:  (i)  the  bandwidth 
Be  where  |A(jcj)|  is  less  than  a  specified  value  e,  or  (ii)  the  maximum  A b  of  |A(jcj)|  in 
a  prescribed  band  B. 

By  way  of  example,  Fig.  1  shows  A (ju)  vs  w  for  the  4-th  order  all-pass  approximants 
of  (T  =  1)  obtained  according  to  (2.1)  with  K  =  2  (curve  a),  to  the  procedure 
suggested  in  [2]  for  Us(t)  =  t2  (curve  b),  to  the  standard  Pade  procedure  (curve  c),  and  to 
the  heuristic  method  in  [5]  (curve  d).  For  instance,  with  reference  to  criterion  (i)  above, 
the  Pade  approximant  is  best  for  e  very  small,  the  method  suggested  in  [2]  is  optimal 
for  e  ~  10°,  and  the  heuristic  method  and  the  method  suggested  in  [3]  are  preferable  for 
e  >  45°. 

Analogous  results  are  obtainable  for  approximants  of  different  order. 


Fig.  1.  Phase  deviations  A (ju)  for  the  considered  4-th  order  approximants. 


4  Conclusions 

The  approximation  procedure  presented  in  [2]  and  [3]  have  been  embedded  in  a  unified 
frame  which  points  out  well  their  respective  features  and  allows  us  to  determine  the 
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parameters  of  the  approximants  in  an  easier  way.  Criteria  have  been  provided  for  choosing 

the  approximation  method  that  is  most  suited  to  the  specific  application. 
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Abstract 

In  previous  work  we  have  suggested  obtaining  rational  interpolants  of  a  function  /  by 
attaching  optimally  placed  poles  to  its  interpolating  polynomials.  For  a  large  number  of 
interpolation  points  these  polynomials  are  well-known  to  be  good  approximates  only  if 
the  nodes  tend  to  cluster  near  the  endpoints  of  the  interval,  as  with  Cebysev  or  Legendre 
points.  In  practice,  however,  one  would  prefer  to  have  them  closer  to  equidistant.  This 
will  in  particular  be  the  case  when  the  difficult  portion  of  /  lies  well  within  the  interior 
of  the  interval,  or  when  approximating  derivatives  of  /,  as  in  the  solution  of  differential 
equations.  To  address  this  difficulty,  we  use  here  a  conformal  change  of  variable  to  shift 
the  points  from  the  Cebysev  position  toward  a  more  equidistant  distribution  in  a  way 
that  should  maintain  the  exponential  convergence  when  /  is  analytic.  Numerical  examples 
demonstrate  the  resulting  improvement  in  the  quality  of  the  approximation. 

1  Introduction 

We  are  concerned  here  with  rational  approximation  of  a  continuous  function  /  on  an 
interval  [a,  b\,  which  we  may  take  as  [-1, 1]  =:  J,  after  a  linear  change  of  variable  when 
necessary.  We  further  assume  that  the  approximant,  r  should  interpolate  /  between  a 
finite  number,  say  N  +  1,  of  distinct  points  (nodes)  Xo,  a?i , . . . ,  %n  in  I-  In  a  similar  way 
as  in  [5],  r  will  be  constructed  by  attaching  a  certain  number  of  poles  to  an  interpolating 
polynomial. 

In  some  applications,  such  as  the  numerical  solution  of  two-point  boundary  value 
problems  (see,  e.g.,  [6]),  one  may  choose  the  points  more  or  less  at  will;  in  that  case, 
one  will  place  them  so  as  to  reach  the  best  compromise  between  two  often  conflicting 
goals:  points  good  for  interpolation,  on  one  side,  and  points  favourable  for  the  condition 
of  the  problem  to  be  solved,  on  the  other  side.  In  [5],  we  have  considered  equidistant 
and  Cebysev  points,  the  first  for  their  regularity,  the  second  for  the  condition  of  the 
interpolation  and  for  the  fast  convergence  of  the  interpolant  for  very  smooth  functions. 
For  the  solution  of  two-point  boundary  problems  in  [6]  we  have  merely  used  Cebysev 
points. 
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There  is  in  general  no  reason  besides  the  problem  condition  for  accumulating  the 
nodes  toward  the  boundary,  as  with  Cebysev  or  Legendre  points.  Moreover,  one  of  the 
reasons  for  using  rational  instead  of  polynomial  interpolation  is  its  better  suitability  for 
approximating  functions  with  large  slopes.  Here  too,  shifting  the  points  away  from  the 
center  may  not  be  appropriate. 

Another  odd  consequence  of  accumulating  interpolation  points  toward  the  extremities 
is  the  consequent  ill-conditioning  of  the  derivatives  of  the  interpolating  polynomials  [7, 1] . 
This  worsens  the  stability  properties  of  time-stepping  in  the  solution  of  time  evolution 
problems  with  the  method  of  lines  [13]  as  well  as  the  convergence  of  iterative  methods 
for  solving  discretized  stationary  problems  [3]. 

To  address  these  difficulties,  we  will  take  advantage  here  of  the  fact  that  the  fast  con¬ 
vergence  of  the  interpolant  can  be  maintained  while  shifting  the  points  with  a  conformal 
map  g  (independent  of  N)  toward  an  equidistant  position.  This,  however,  requires  an 
important  change  to  the  method  in  [5],  because  this  point  shift  ruins  the  exponential 
convergence  of  the  Cebysev  interpolating  polynomial.  We  therefore  use  here  as  the  start¬ 
ing  interpolant  the  polynomial  interpolating  /(^~1)  in  the  domain  of  the  inverse  g~l  of 
the  conformal  map  employed  for  the  point  shift,  and  attach  poles  to  this  polynomial. 

Section  2  reviews  the  formulae  and  advantages  of  shifting  Cebysev  points  conformally 
toward  the  center  of  the  interval  when  interpolating  functions,  and  Section  3  briefly  re¬ 
calls  the  method  of  optimally  attaching  poles  to  the  interpolating  polynomial  introduced 
in  our  earlier  work.  In  Section  4  we  describe  how  to  take  advantage  of  the  better  condi¬ 
tioning  of  derivatives  induced  by  the  conformal  point  shift;  the  corresponding  practical 
improvements  are  finally  documented  with  numerical  examples. 


2  Rational  interpolation  with  a  variable  change  for  point  shifts 

Let  Vm  and  7£m>n,  respectively,  denote  the  linear  space  of  all  polynomials  of  degree  at 
most  m  and  the  set  of  all  rational  functions  with  numerator  in  Vm  and  denominator  in 
Vn\  furthermore,  denote  by  fk  the  interpolated  values  f(xk ),  k  =  0(1) AT,  of  /.  Then,  the 
unique  polynomial  p  G  Vn  that  interpolates  /  at  the  x^’s, 

N 

p(x)  =  ^2fkLk(x),  Lk(x)  :=Y[(X  -  xi) 

fc= 0  i^k  /  i^k 

can  be  written  in  its  bary centric  form  [9] 


X-Xk 


x-xk 


where  the  so-called  weight  Wk  corresponding  to  the  point  Xk  is  given  by 


wk:=l/  JI  ( xk-Xi ). 


i=0,  i^k 


Despite  its  appearance,  (2.1)  determines  a  polynomial  of  degree  at  most  N :  the  wk 
are  precisely  the  numbers  which  guarantee  this  [4].  By  choosing  other  a  rational 
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interpolant  is  constructed. 

The  bary centric  formula  has  several  advantages  over  other  representations  of  the 
interpolating  polynomial  ([4]  p.  357).  One  of  them  is  the  fact  that  the  weights  appear  in 
both  the  numerator  and  the  denominator ,  so  that  they  can  be  divided  by  any  common 
factor.  For  example,  simplified  weights  for  Cebysev  points  of  the  first  kind  :=  cos (j)k, 
where  (j>k  :=  2^+1) ^  and  =  0, . . . ,  JV,  are  given  by  tuj^  =  (-l)fesin<£fc  ([9]  p.  249), 

while  for  the  Cebysev  points  of  the  second  kind  x^  :=  cos  kfj  -  which  will  be  used  here 
-  one  simply  has  Salzer’s  formula  ([9]  p.  252) 


«42)  =  (-1)%. 


f  1/2,  k  —  0  or  k  —  N, 
\l,  otherwise. 


These  points  are,  together  with  Legendre’s,  the  most  used  nodes  for  global  polynomial 
interpolation  and  large  N.  They  achieve  exponential  convergence  of  p  toward  /  if  the 
latter  is  analytic  in  an  ellipse  Ep  with  foci  at  ±1  and  sum  of  its  axes  equal  to  2 p,  p>  1. 
However,  this  fast  convergence  comes  at  the  cost  of  a  concentration  of  the  nodes  in 
the  vicinity  of  the  extremities  of  /.  As  mentioned  above,  this  accumulation  may  have 
drawbacks,  such  as  poor  spreading  of  the  information  about  /  over  the  interval  and 
ill- conditioning  of  the  derivatives  near  the  endpoints. 

With  a  suitable  choice  of  the  interpolant,  one  may  conformally  shift  the  nodes  to¬ 
ward  an  equidistant  position  (though  not  all  the  way)  without  losing  the  exponential 
convergence.  For  that  purpose,  one  considers,  beside  the  rc-space  in  which  /  is  to  be 
approximated,  another  space,  denoted  by  y,  say,  and  the  N  +  1  Cebysev  points  of  the 
second  kind 

(2) 

Vk  = 


in  the  interval  J  :=  [-1,1]  in  this  y-space.  Let  g  be  a  conformal  map  from  a  domain  V\ 
containing  J  (in  the  y-space)  to  a  domain  V 2  containing  /  (in  the  x-space);  moreover, 
suppose  that  /  is  a  function  €  such  that  the  composition  fog  :  T>\ t->  C  is  analytic 
in  an  ellipse  Ep,  as  defined  above.  With  this  map  we  may  define  new  interpolation  points 
on  J,  xk  —  y(yfc)5  ^  well  as  the  conformal  transplantation  F(y)  :=  f(x)  [10]  of  /  into 
the  y-space. 

Then,  with  the  polynomial  interpolating  F(y)  at  the  yk 


N 


N 


AN{y)  :=  ^2  F(yk)Lk{y )  =  ^2f(xk)Lk(g  1(x))  =:  aN(x), 


k= 0 


k= 0 


one  has 


\aN(x)-f(x)\  =  0(p~%  s  €[-1,1] 


(2.2) 


Rational  interpolation  with  ail  poles  prescribed  is  very  simple  in  the  barycentric  setting 
[5]:  the  P  poles  zi  are  attached  to  (2.1)  by  replacing  wk  with 

p 

bk  “  Wkdk'i  dk  J[  ““  %i)' 

i= 1 
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If  N  >  P  this  results  in  a  rational  interpolant  in  7 Z^,p  with  poles  at  i  =  1, . . .  ,P 
(when  such  an  interpolant  exists,  see  [5]). 

Remark  2.1  Exponential  convergence  of  interpolation  at  the  shifted  points  is  also  at¬ 
tained  with  the  rational  function  given  by  (2.1)  with  Wk  =  [2].  However,  this  is 

in  general  a  rational  function  in  Rn,v,  v  >  N  —  P:  there  is  not  enough  defect  in  the 
denominator  degree  for  the  weights  w^dk  to  warrant  the  presence  of  the  P  poles  zit 

We  then  use  ajv  as  the  starting  interpolant  to  which  we  attach  the  poles  V{  in  the 
y- space.  This  yields 


R(y)  := 


JV  Wk~[[(yk-Vi)  N  wk~[\(g  1{xk)-9  1(zi)) 

fb  y-Vk  fb  9  {x)-g  1(xk) 


N  Vtllto-V') 

ST'  z— 1 _ 

rr'n  y-yk 


jv  wk  f[  (y  H**)  -  g  1{zt)) 

£  9~1{x)-g~1(xk) 


If  a  rational  interpolant  with  these  poles  exists,  it  is  given  in  the  y- space  by  R ,  and  r  is 
a  rational  function  in  the  argument  g~1(x).  Its  poles  are  at  zi  =  g(vi). 


3  Construction  of  the  optimal  interpolant 

Our  method  consists  in  optimizing  the  position  of  the  vfs  so  as  to  minimize 


\\H  —  F\\oo  =  ||r  —  /(|oo, 

as  described  in  §3  of  [5].  Optimal  V{  s  always  exist,  but  these  are  not  unique  in  general. 
Whether  the  optimal  R  is  unique  is  an  open  question;  however,  for  every  optimized  pole 
Vi  an  indicator  may  be  calculated  which,  if  nonzero,  guarantees  that  Vi  is  indeed  a  pole 
of  R. 

In  the  practical  computations  documented  in  §5  the  optimization  of  the  was 
performed  using  the  same  two  algorithms  as  in  [5]:  for  small  N  we  used  a  discrete  differ¬ 
ential  correction  algorithm  according  to  [11],  while  for  larger  N  the  simulated  annealing 
method  of  [8]  was  applied.  Both  methods  will  in  principle  locate  a  desired  global  maxi¬ 
mum.  The  first  method  achieves  it  in  a  systematic  and  guaranteed  way  evaluating  the 
error  not  continuously  but  on  a  fine  grid;  the  simulated  annealing  method  cannot  be 
guaranteed  to  find  the  global  extremum  but,  when  used  for  an  extensive  search,  will 
produce  a  reasonable  approximation  of  it. 

As  mentioned  in  [5] ,  our  way  of  attaching  poles  to  the  interpolating  polynomial  has  a 
very  nice  property:  the  approximation  error  can  only  decrease  or  at  worst  stay  constant 
with  a  growing  number  of  poles,  this  in  sharp  contrast  with  classical  rational  interpola¬ 
tion;  when  a  new  unknown,  say  Vj ,  is  added  to  the  set  of  variables,  {-iq, . . .  the 

optimal  values  of  the  latter  are  a  feasible  vector  for  the  higher  dimensional  optimization. 

Let  us  conclude  this  section  with  a  comment  on  the  use  of  the  nomenclature  “at¬ 
taching  the  poles”.  In  classical  rational  interpolation,  the  poles  of  the  interpolant  are 
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determined  by  the  data.  There  too,  however,  one  sometimes  wishes  to  prescribe  the  loc¬ 
ation  of  the  poles  (with  corresponding  decrease  of  the  number  of  degrees  of  freedom): 
many  authors  then  speak  of  “assigning”,  or  “prescribing”  the  poles.  In  that  sense  one 
cannot  “assign”  poles  to  a  polynomial,  which  obviously  cannot  have  poles.  We  thus 
start  with  the  interpolating  polynomial  and  its  poles  at  infinity  and  make  it  a  rational 
interpolant  by  bringing  the  poles  into  an  optimal  position  in  <D.  We  call  this  procedure 
“attaching  poles”,  to  distinguish  it  from  the  process  of  forcing  a  rational  function  to 
have  a  pole  at  a  particular  place. 

4  Derivatives  of  the  optimal  interpolant  with  shifted  points 

As  mentioned  in  §1,  one  of  the  reasons  for  shifting  the  points  from  their  Cebysev  position 
toward  the  interior  of  the  interval  is  the  improvement  of  the  condition  of  the  derivatives 
resulting  from  such  a  shift.  Besides  r,  we  will  evaluate  also  r '  and  r"  as  approximants  of 
■/',  resp.  /",  and  estimate  ||r  -  /'||oo  and  ||r  —  /"||oo- 

Schneider  and  Werner  [14]  have  noticed  that  every  rational  interpolant  R  €  71n>n, 
written  in  its  barycentric  form 
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o 


uk 

V  “  Vk 


fk 


Uk 
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can  easily  be  differentiated.  The  formulae  for  the  first  two  derivatives  read 

R'(y)  =  < 
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with  R[z,z,yk ]  =  EHiLMhiaA ,  The  chain  rule  then  yields,  for  r(x)  =  R(g  ’(x)), 


r'(x)  -  R\y)  ■  [g  V)]  r"(x)  -  Jn,M]2R  (v)  (4j1) 


[g'(y)}: 


[g'{y)\ 


Specifically,  in  our  calculations  we  have  used  the  map  suggested  by  Kosloff  and  Tal-Ezer 

M  .  ,  . 
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a 

P  ~  A 

P  =  6 

1.42e  -  6 

5.83e  —  8 

9.38e  -  9 

1.30e  -  9 

3.11e  —  5 

6.69e  -  7 

2.48e  -  8 

4.21e  —  9 

4.23e  -  10 

8.06e  -  6 

1.60e  —  7 

9.47e  -  10 

1.27e  —  10 

0.9 

1.12e  -  6 

1.97e  -  8 

5.90e  - 10 

3.94e  -  11 

2.05e  — 11 

2.78e  -  7 

4.47e  -  9 

1.29e  -  10 

1.36e  -  11 

3.82e  -  12 

1.85e  -  7 

2.93e  —  9 

8.27e-  11 

4.20e  -  12 

3.88e  -  12 

Tab.  1.  Errors  when  approximating  /  with  increasing  P  and  a  in  Example  1. 


In  the  limiting  cases,  a  -*•  0  keeps  the  points  at  their  Cebysev  position,  whereas  a  ->  1 
renders  them  equidistant.  The  derivatives  of  g  are  given  by 


9'(v)  =  ~  =  g"(y)  =  — 


arcsin 


a  y/1-  ( ay)* 


arcsm  a 


so  that  in  (4.1) 


1/(1 -M2)3’ 


g"{y) 


=  (arcsin  a)y- 


5  Numerical  evidence 


We  now  report  on  practical  computations,  performed  on  two  examples,  which  demon- 
s  rate  the  efficiency  of  point  shifts  for  improving  the  rational  interpolants  with  optimized 
denominators.  These  examples  share  the  property  that  the  difficult  part  of  /  lies  in  the 
center  of  I  so  that  the  shift  of  the  points  toward  a  more  equidistant  position  naturally 
improves  the  quality  of  the  information  provided  to  the  interpolation  method. 


a 

P  —  2 

P  =  4 

5.27e  —  3 

1.26e  -  3 

4.85e  -  6 

0.5 

2.67e  -  3 

5.87e  —  4 

2.33e  —  6 

4.03e  —  7 

4.63e  —  8 

0.75 

7.47e  -  4 

1.49e  -  5 

5.16e  —  7 

9.44e  -  8 

1.30e  —  8 

0.9 

1.14e  —  4 

2. Ole  —  6 

6.56e  —  8 

4.28e  -  9 

2.16e  —  9 

0.95 

2.97e  -  5 

4.99e  -  7 

1.48e  -  8 

1.59e  —  9 

4.52e-  10 

0.96 

2.01e  -  5 

9.52e  -  9 

4.80e  -  10 

4.70e  -  10 

Tab.  2.  Errors  when  approximating  f  with  increasing  P  and  a  in  Example  1. 


The  sup-norm  ||  I  loo  has  thereby  been  estimated  by  considering  the  1000  equa 
spaced  points  xt  =  +  ,t  =  1(1)1000,  on  the  interval  [-5/4, 5/4]  and  computi: 

tne  maximal  absolute  value  of  the  error  at  those  X£  lying  in  [— 1,  lj. 

Example  5.1  We  have  first  revisited  Example  3  of  [5],  which  displays  in  the  center 


Jean-Paul  Beirut  and  Hans  D.  Mittetmann 


I  a  slope  increasing  with  a  positive  parameter,  here  denoted  by  e, 


f(x)  =  COS7T.T  + 


erf(<5z) 


6  =  >flk, 


erf^J  »  -  - ’ 

where  erf  denotes  the  error  function  (see  [5]  for  a  graph). 

In  Table  1  we  give  the  results  obtained  with  e  =  500  and  N  =  81,  increasing  numbers 
P  of  poles  and  increasing  a.  Tables  2  and  3  display  the  same  information  for  the  appro¬ 
ximation  of  /'  and  f"  with  r'  and  r"  as  given  by  the  formulae  (4.1  .  The > 
of  extra  poles  and  a  point  shift  brings  about  7  digits  of  accuracy,  where  he  point  sh  ft 
alone  makes  only  for  2-3.  The  improvement  in  the  derivatives  is  especially  remarkable 
the  error  in  the  second  derivative  decreases  from  the  useless  value  of  9.26  to  about  10  . 


9.26 


.26 


9.50e  -  1 


9.30e  -  2 


1.59e  -  2 


9.18e  -  3 


4.05e  -  2 


2.07e  -  2 


5.48e  -  3 


6.49e  -  4 


1.23e  -  4 


7.36e  -  5 


4.82e  -  4 


2.18e  —  4 


6.25e  -  5 


8.86e  -  6 


1.88e  -  6 


l,29e  —  6 


7.85e  -  5 


3.75e  -  5 


9.53e  -  6 


4.93e  -  7 


1.75e  -  7 


6.00e  -  8 


1.46e  -  5 


4.91e  -  6 


1.26e-6 


2.34e  -  7 


5.31e 


5.57e  -  8 


Tab.  3.  Errors  when  approximating  /"  with  increasing  P  and  a  in  Example  1. 

Example  5.2  Example  3  in  [5]  has  demonstrated  that  the  attachment  of  poles  may  be 
very  effective  in  improving  the  approximation  of  oscillatory  functions.  Here  we  change 

the  function  to  2  , 

h(x)  =  e~ax  sinfcr,  a  >  0,  b  >  0, 

so  that  the  most  oscillatory  part  lies  in  the  center  of  the  interval. 

Results  with  a  =  5,  b  =  25,  N  =  31,  P  =  0  and  P  =  2  are  given  in  Table  4.  In  contrast 
with  the  preceding  example,  here  the  point  shift  brings  much  more  improvement  than  the 
attachment  of  poles,  about  6-7  digits,  an  especially  heartening  fact  for  the  derivative. , 
to  which  the  interpolants  without  shift  are  useless  approximants. 

Acknowledgement:  The  authors  wish  to  thank  Peter  Graves-Morns  for  his  comments 
which  have  enhanced  the  present  text. 
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Abstract 

Mathematical  models  of  blood  flow  are  inevitably  embedded  in  models  of  human  ther¬ 
moregulation  because  they  take  the  role  of  the  most  significant  heat  distributor  in  models 
of  the  human  thermal  system  [14,  6].  Models  of  human  thermoregulation  have  a  wide 
range  of  applications,  e.g.  for  the  prediction  of  the  impact  of  accidents,  diseases  and  clin¬ 
ical  treatments  (see  [14]  and  the  references  therein).  The  application  of  our  interest  is  the 
prediction  of  the  influence  of  cooling  on  the  heat  distribution  in  premature  infants,  see 
Section  2.  In  Section  3  we  discuss  the  requirements  of  a  reliable  thermoregulation  model 
while  the  governing  equation  is  described  in  paragraph  four.  The  employed  blood  flow 
model  is  discussed  within  Section  5.  Section  6  deals  with  numerical  results,  followed  by 
concluding  remarks  in  the  last  paragraph. 

1  Motivation 

Lack  of  oxygen  of  the  fetus  or  newborn  is  known  to  be  an  important  cause  for  injuries  of 
the  developing  brain  [9].  Experimental  studies  have  shown  that  the  neuronal  loss  evolves 
over  several  days  after  such  an  incident  [8].  An  important  factor  influencing  the  degree 
and  distribution  of  neuronal  loss  is  the  cerebral  temperature,  i.e.  lowering  the  cerebral 
temperature  can  prevent  much  damage  [5]. 

The  question  arises,  if  it  is  possible  to  lower  the  cerebral  temperature  of  an  infant  by 

2  —  3  K  by  the  manipulation  of  the  environment  inside  an  incubator  while  the  rest  of 
the  body  maintains  a  pleasant  temperature.  The  objective  of  this  paper  is  to  discuss  the 
mathematical  measurements  which  can  be  used  to  predict  an  answer  to  that  question 
by  the  use  of  numerical  simulations. 

2  Modeling  the  thermoregulation  of  premature  infants 

The  term  thermoregulation  stands  for  the  measurements  of  the  body  to  hold  a  pleasant 
temperature  [4].  Models  for  thermoregulation  consist  of  two  parts:  the  active  and  the 
passive  system  [6],  The  active  system  consists  of  the  regulatory  mechanisms  shivering 
(heat  production  within  the  muscles  attached  to  the  skeleton),  vasomotion  (control  over 
the  degree  of  blood  flow  within  the  skin)  and  sweating  (control  over  the  degree  of  effect¬ 
iveness  of  heat  transfer  between  the  infant  and  the  surrounding  air).  The  passive  system 


428 


Blood  flow  model  429 

is  the  combination  of  the  physical  human  body  and  the  heat  transfer  in  it  and  at  its 
surface.  The  idea  behind  this  distinction  is  that  the  active  system  has  a  controlling  in¬ 
fluence  over  the  passive  system.  Naturally,  only  results  obtained  by  the  complete  model 
can  be  compared  with  available  real  life  data. 

Concerning  premature  infants,  it  is  known  that  shivering  and  sweating  are  not  of 
importance  for  the  modelling  process  [4,  13],  while  vasomotion  should  not  be  of  great 
concern  for  our  special  application  [13].  The  modeling  of  the  passive  system  demands  the 
discretization  of  the  body  and  the  modeling  of  metabolic  heat  production  and  blood  flow. 

We  do  not  consider  phenomena  which  are  related  to  environmental  conditions,  namely 
the  response  to  air  convection,  the  probability  to  gain  or  loose  heat  due  to  radiation  and 
heat  loss  due  to  evaporation  in  dependence  on  pressure,  temperature  and  humidity  of 
the  surrounding  air,  assuming  that  these  are  controllable  by  the  use  of  an  incubator  [13]. 

In  order  to  give  an  answer  to  the  defined  question  by  use  of  numerical  simulations, 
a  model  needs  to  deliver  detailed  temperature  profiles  within  the  head  and  a  detailed 
resolution  of  the  heat  transfer  processes  in  the  body.  It  should  be  applicable  to  different 
size  neonates  whereby  aspects  like  the  anatomy  and  the  thermal  maturity  have  to  be 
considered.  With  the  exception  of  the  blood  flow  model,  these  aspects  can  be  defined  via 
a  suitable  geometry  and  the  use  of  real  life  data  for  spatially  dependent  rates  of  metabolic 
heat  production  within  a  numerical  method  [7,  2].  This  also  incorporates  that  existing 
numerical  methods  made  for  the  simulation  of  thermoregulation  of  adults  are  of  no  use 
in  the  given  context  since  studies  have  shown  [3]  that  a  detailed  modeling  of  geometry 
and  tissue  composition  is  necessary  in  order  to  obtain  relevant  temperature  profiles.  As 
it  can  be  shown  experimentally  [7,  2]  in  agreement  to  theoretical  discussions  concerning 
thermoregulation  models  of  adults  [6,  14],  the  use  of  a  blood  flow  model  greatly  affects 
the  computed  numerical  solutions. 

3  Analysis  of  the  blood  flow  model 

The  bio-heat  equation  derived  by  Pennes  [10]  forms  the  basis  of  the  majority  of  models 
for  human  thermoregulation  in  use  today  [14,  6] .  It  describes  the  dissipation  of  heat  in 
a  homogeneous,  infinite  tissue  volume.  For  two  spatial  dimensions,  it  can  be  written  in 
the  form 

c(x)p(x)dtT(x,  £)  =  div[A(x)VT(x,t)]  +  f(x,t).  (3.1) 

Thereby,  the  temperature  T  depends  on  the  spatial  variable  x  =  (x\,X2)T  as  well 
as  on  time  t.  Furthermore,  A(x),  c(x)  and  p(x)  denote  the  heat  conductivity,  specific 
heat  capacity  and  density  of  the  tissue,  respectively.  The  term  /(x)  can  be  decomposed 
via  /(x,  t)  —  Qm(x)  +  Qb(x,£)  into  parts  corresponding  to  metabolic  heat  production 
Qm(x)  and  blood  flow  Qs(x,£). 

As  already  indicated,  the  term  Qm(x)  can  be  defined  by  the  use  of  real  life  data 
[7] .  The  formulation  of  the  source  term  due  to  blood  flow  is  based  on  variations  of  the 
following  procedure  [6,  14].  The  idea  is  that  the  body  is  supplied  from  a  central  pool  of 
blood  by  the  major  arteries.  Before  the  tissue  is  perfused,  the  temperature  of  the  arterial 
blood  mixes  with  the  temperature  of  venous  blood  flowing  in  adjacent  veins.  After  that, 
the  arterial  blood  exchanges  heat  with  the  tissue  in  the  capillaries  and  becomes  venous 
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blood.  The  venous  blood  is  collected  in  the  major  veins  and  its  temperature  mixes  with 
the  temperature  of  arterial  blood  in  the  adjacent  arteries  before  it  flows  back  into  the 
blood  pool. 

Since  equation  (3.1)  deals  with  the  change  of  thermal  energy  per  unit  volume,  the 
term  <5#(x)  takes  the  form 

Qb(x,  t)  =  cbPbCCX(x)BF(x)  [TB(t)  -  T(x,  <)] ,  (3.2) 

whereby  TB  (t)  denotes  the  time-dependent  mean  value  of  the  temperature  of  the  blood 
within  the  blood  pool,  we  also  assume  that  the  specific  density  of  the  blood  pB  and  the 
specific  heat  capacity  of  the  blood  cb  are  constant  variables. 

The  described  modeling  results  in  a  differential  equation  for  the  temporal  evolution 
of  the  temperature  within  the  blood  pool,  namely  in 

mBcBdtTB{t)=  [  pBcBCCX(x)BF(x)dx[Tv(t)  ~TB(t)\.  (3.3) 

Jd 

Thereby,  the  total  blood  mass  mB ,  the  time  dependent  mean  value  of  the  temperature 
of  the  venous  blood  7V(£),  and  locally  defined  tissue-dependent  measures  for  the  blood 
perfusion  BF(x)  and  the  counter-current,  heat  exchange  CCX(x)  are  introduced. 

Equation  (3.3)  shows  that  the  temporal  change  of  the  blood  pool  temperature  is 
proportional  to  the  difference  to  the  temperature  of  the  venous  blood.  The  outlined  idea 
leads  to  the  modeling  of  the  temperature  of  the  venous  blood  as 


=  fDCCX(x)BF(x)T(x,t)dx 
vK)  JD  CCX (x)BF(x)  dx 


(3.4) 


which  is  also  usable  when  only  steady  states  are  considered  [7].  The  crucial  terms  in 
the  order  of  importance  are  the  blood  perfusion  BF(x)  and  the  counter  current  heat 
exchange  CCX(x ). 

There  is  much  debate  about  the  choice  of  these  functions  in  literature  [14,  6].  This 
debate  arises  because  the  representation  of  blood  circulation  is  substituted  by  a  rather 
simple  model  formulation.  The  cure  to  this  disadvantage  is  generally  sought  by  exploring 
more  and  more  detailed  models  of  microstructure,  organs,  etc.,  or  it  is  sought  by  a  better 
modeling  of  control  mechanisms  of  the  actice  system  in  the  case  of  adults  [14,  6]. 

The  main  drawback  of  the  described  blood  flow  model  is  given  by  the  blood  pool  idea 
itself.  This  is  up  to  now  to  our  knowledge  not  outlined  in  any  mathematical  description 
of  this  model  within  the  literature  and  can  be  illustrated  as  follows.  Let  a  detailed  geo¬ 
metry  be  given  with  a  stationary  temperature  distribution  together  with  a  homogeneous 
neutral  temperature  at  the  whole  boundary  as  initial  state.  Let  us  assume  that  we  start 
a  numerical  computation  where  a  selective  cooling  at  the  neck  is  employed.  By  heat  con¬ 
duction  of  the  tissue,  the  effect  of  cooling  computed  with  the  help  of  the  discretization 
of  heat  gradient  and  heat  conductivity  of  the  local  tissue  propagates  into  the  inner  part 
of  the  domain.  Concerning  the  blood  flow,  the  averaging  step  within  (3.4)  captures  the 
local  cooling  effect  which  results  in  a  slightly  cooler  average  temperature  of  the  venous 
blood  within  the  whole  domain  than  in  the  initial  state.  Employing  this  value  in  (3.3) 
results  in  a  slight  negative  change  of  the  blood  pool  temperature.  Taking  account  of  the 
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evaluation  of  the  source  term  (3.2)  for  the  control  volumes  located  in  the  vicinity  of  the 
neck,  we  notice  that  a  strong  cooling  is  locally  equalized  by  the  combination  of  a)  the 
source  term  due  to  blood  flow  which  is  mostly  influenced  by  the  neutral  blood  temper¬ 
ature  in  the  rest  of  the  body  and  b)  of  the  source  term  due  to  metabolic  heat  production 
which  was  not  influenced  at  all  by  the  change  in  the  boundary  temperature.  The  result 
is  that  the  effect  of  a  local  cooling  mechanism  is  instantly  distributed  over  the  whole 
domain  while  a  weighted  mean  Value  of  the  temperature  over  the  domain  equalizes  local 
cooling  mechanisms.  The  validity  of  this  reasoning  is  verified  by  numerical  results  [7,  2] 
and  by  an  exemplary  result  shown  in  Section  6. 

The  non-local  nature  of  the  described  blood  flow  model  can  directly  be  seen  by 
applying  an  implicit  time  stepping  strategy.  Due  to  the  integration  over  the  whole  com¬ 
putational  domain  in  (3.4),  one  ends  up  with  a  fully  occupied  matrix  after  the  usual 
linearization  step  which  was  already  recognized  in  [7]  in  the  context  of  steady  state 
calculations. 

We  now  illuminate  a  further  property  of  the  bloodflow  model.  Therefore,  let  the 
abbreviations  a  —  pbCb,  P  =  JD  Kb(x)B(x)  dx  and  7  =  Pb/ttib  hold.  A  straightforward 
computation  gives 

TB(t)  =  Tv(t)-±±TB(t).  (3.5) 

Note  that  a,  j3  and  7  are  positive  constants.  Consider  a  steady  state  situation  as  initial 
state,  i.e.  Tb  =  Ty  holds.  If  the  body  is  heated,  the  temperature  within  the  body 
increases  and  so  Ty  will  increase.  This  has  the  effect  that  the  bloodpool  temperature  Tb 
will  increase  in  the  near  future,  i.e.  TB(t)  >  0.  We  now  investigate  the  net  effect  of  the 
bloodflow.  Integration  of  the  source  over  the  computational  domain  D  results  in 

[  QB(x,t)dx  =  a\/3TB(t)-  [  /CB(x)S(x)T(x,i)rfx]  (=5)  —  —  ~r.TB(t). 
jd  L  Jd  J  7  dt 

When  employing  TB(t)  >  0  we  see  that  the  total  of  all  sources  in  the  body  is  negative, 
i.e.  while  the  blood  in  the  bloodpool  cools  the  increasingly  warm  body  in  the  mean  if 
the  body  is  exposed  to  heat,  it  also  takes  over  heat  from  it.  The  bloodpool  and  the  body 
are  to  be  seen  as  two  separate  systems  which  are  connected  via  heat  fluxes  and  so  one 
can  consider  the  bloodpool  as  a  regulator. 

4  Numerical  method  and  experiments 

The  following  numerical  approximation  of  the  unsteady  bio-heat  equation  (3.1)  repres¬ 
ents  a  convenient  extension  of  the  finite  volume  method  developed  in  [7],  which  has  been 
proven  to  be  a  robust,  accurate  and  reliable  algorithm  in  the  context  of  steady  state 
temperature  distributions.  However,  finite  volume  schemes  are  categorically  based  on 
the  integral  form  of  the  governing  equation.  In  order  to  apply  Gauss’s  integral  theorem 
it  is  neccessary  to  write  the  equation  in  divergence  form.  Therefore,  we  introduce  the 
auxiliary  variable  k(x)  =  p(x)c(x)  and  the  auxiliary  temperature  T(x,£)  =  k(x)T(x,t) 
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into  the  governing  equation  and  consequently  the  bio-heat  equation  (3.1)  writes 

^  l  t *>  * = L  [  w 1 0  -  - n<x)  * + /. ,(xi  ** d* 

(4.1) 

for  all  control  volumes  cr  C  D,  see  [2].  In  order  to  solve  equation  (4.1)  numerically, 
the  space  part  D  is  decomposed  into  a  finite  number  of  sub-domains.  We  start  from  an 


Fig.  1.  General  form  of  a  control  volume  of  the  triangulation  (left)  and  its  boundary 
(right). 


arbitrary  conforming  triangulation  T>h  of  the  domain  D  which  is  called  the  primary  mesh 
and  consisting  of  finitely  many  triangles  and  the  corresponding  nodes  are  abbreviated 
by  X;  €  Z>.  Based  on  the  triangulation  a  discrete  control  volume  Oi  is  defined  as  the  open 
set  of  R2  including  the  node  x*  and  bounded  by  straight  lines  which  are  determined  by 
the  connection  of  the  midpoints  of  the  edges  of  the  corresponding  triangles  Vj  (i.e. 
Xi  €  dVj)  and  their  barycentre  (see  Figure  1).  The  union  Bh  of  all  boxes  is  called  the 
secondary  mesh.  A  finite  volume  method  represents  a  discretizationof  the  evolutionary 
equation  (4.1)  for  cell  averages  defined  by  ( MT )  (t)\a  =  (1/M)  faT(x,t)  dx,  where  M 
denotes  the  volume  of  the  box  a.  With  respect  to  the  secondary  mesh  Bh  we  can  write 
the  integral  form  (4.1)  as 


X  (x)T(x,t) 


Vk(x)  *  n(x)  ds 


J  QB{x,t)dx  +  J  <?A/(x)dx)  ,  Voi 


Corresponding  to  a  finite  element  method  the  evaluation  of  the  boundary  integral  is  per¬ 
formed  by  using  a  piecewise  constant  distribution  of  the  heat  coefficient  A  and  a  piecewise 
linear  distribution  of  the  auxilary  temperature  T.  with  respect  to  the  triangles  of  the 
trianglutaion  used.  Note  that  the  source  term  remains  unchanged  and  the  calculation  is 
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given  by 


/  QM(x)dx  =  \<7i\QM(xi) 

J  CTi 

[  Qb(x,  t )  dx  =  fa \cbpbCCX (xi)BF(xi)  [TB(t)  -  T(xut)} . 
J  <7  i 


The  computation  of  the  blood  pool  temperature  is  directly  performed  by  an  explicit  time 
discretization  of  equation  (3.3).  Thereby,  the  temperature  of  the  venous  blood  is  given 
by  equation  (3.4). 

It  is  remarkable  that  the  method  degenerates  to  the  scheme  presented  in  [7]  in  the 
context  of  a  steady  state  solution  and  therefore  the  excellent  properties  like  the  dis¬ 
crete  min-max  principle  are  maintained  in  such  a  situation.  Due  to  the  space  available 


we  restrict  ourself  to  the  consideration  of  steady  state  calculations  using  the  described 
method.  Thereby,  we  distinguish  layers  of  skin,  fat,  bone  and  kernel  by  different  rates 
of  metabolism,  specific  heat  capacity  and  blood  perfusion  associated  with  the  regions 
depicted  in  Figure  2.  As  boundary  conditions  we  employ  a  comfortable  boundary  tem¬ 
perature  of  309.15  K  at  head,  back,  legs,  and  belly  while  we  set  299.15  AT  at  the  neck, 
i.e.  we  selectively  cool  the  neck.  In  reality,  this  corresponds  to  the  situation  where  the 
infant  is  wearing  a  water-filled  collar  with  the  purpose  of  cooling  the  blood  flowing  into 
the  brain  through  the  arteries  adjacent  to  the  skin. 

In  Figure  3  (a)  we  can  see  the  temperature  distribution  in  the  two-dimensional  dis¬ 
cretized  idealization  of  the  body  of  a  premature  infant.  Thereby,  no  blood  flow  and  no 
metabolic  heat  production  is  applied,  so  that  the  depicted  distribution  of  heat  is  only 
influenced  by  the  heat  conductivity  of  the  employed  tissues.  The  situation  where  tissue 
dependent  metabolic  heat  production  is  taken  into  account  is  shown  in  Figure  3  (b). 
Note  that  the  heat  sources  visualized  within  the  picture  not  only  have  local  effects,  they 
also  influence  the  mean  value  of  the  temperature  of  the  blood  pool.  Within  Figure  d  (c), 
blood  flow  is  additionally  given. 
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It  is  evident  that  the  blood  flow  has  the  effect  outlined  in  Section  5.  Especially,  the 
numerical  solution  incorporates  no  hint  of  the  fact,  that  in  reality  there  is  a  transport 
of  cool  blood  to  the  brain  and  also  a  transport  of  blood  by  the  veins  coming  from  the 
brain. 


299.15  301.1  303.1  305.1  307.0  309.0  310.3 


Temperature  in  [A"] 

Fig.  3.  Comparison  of  steady  state  situations  (a)  only  with  heat  conduction  (b)  with 
heat  conduction  and  metabolic  heat  production  and  (c)  with  blood  flow  additionally 
taken  into  account  (from  top  to  bottom). 


5  Concluding  remarks 

The  range  of  applicability  of  the  described  blood  flow  model  is  restricted  to  situations 
where  it  makes  sense  to  employ  a  mean  value  of  the  whole  blood,  e.g.  if  the  whole  body  is 
exposed  for  a  longer  time  to  the  same  temperature.  For  a  clinical  application  where  the 
effects  of  local  cooling  or  heating  have  to  be  studied,  caution  is  required  when  dealing 
with  the  results  achieved  by  employing  variations  of  the  described  model. 
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Abstract 

Our  interest  lies  in  describing  the  zero  behaviour  of  Gauss  hypergeometric  polynomials 
F(—  n,  6;  c;  z)  where  b  and  c  are  arbitrary  parameters.  In  general,  this  problem  has  not 
been  solved  and  even  when  b  and  c  are  both  real,  the  only  cases  that  have  been  fully 
analysed  impose  additional  restrictions  on  6  and  c.  We  review  recent  results  that  have 
been  proved  for  the  zeros  of  several  classes  of  hypergeometric  polynomials  F(-n,  6;  c;  z) 
where  b  and  c  are  real.  We  show  that  the  number  of  real  zeros  of  F(—n,b ;  c;  z)  for 
arbitrary  real  values  of  the  parameters  b  and  c,  as  well  as  the  intervals  in  which  these 
zeros  (if  any)  lie,  can  be  deduced  from  corresponding  results  for  Jacobi  polynomials. 


1  Introduction 

The  Gauss  hypergeometric  function,  or  2-Fi ,  is  defined  by 

F(a,kCiZ),1+±m^^, 


\Z\  <  1 


where  a ,  b  and  c  are  complex  parameters  and 

|  (a)k  =  <*(a  4- 1) . . .  (a  +  k  -  1)  =  T(a  +  k)/T(a) 

is  Pochhammer’s  symbol.  When  a  =  -n  is  a  negative  integer,  the  series  terminates  and 
reduces  to  a  polynomial  of  degree  n,  called  a  hypergeometric  polynomial.  Our  focus  lies 
in  the  location  of  the  zeros  F(— n,  b ;  c;  z)  for  real  values  of  b  and  c. 

Hypergeometric  polynomials  are  connected  with  several  different  types  of  orthogonal 
polynomials,  notably  Chebyshev,  Legendre,  Gegenbauer  and  Jacobi  polynomials.  In  the 
cases  of  Chebyshev  and  Legendre  polynomials,  the  connection  demands  fixed  special 
values  of  the  parameters  b  and  c,  namely,  (cf.  [1],  p.561) 


(-n,n; 


r„(l  -  2 z) 


F  (-n,  n  +  1;  1;  z)  =  Pn(l  -  2 z), 
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respectively.  However,  in  the  cases  of  Gegenbauer  and  Jacobi  polynomials,  we  have 

f’(-n,n  +  2A;A+i;z)  =^L-C£(l-2*)  '  (1.1) 

and 

F(-n,a  +  /?  +  l  +  n;  a  +  1;  z)  =  -  ^’^(l  -  2a:),  (1.2) 

r  J-;n 

respectively.  Since  the  zeros  of  orthogonal  polynomials  are  well  understood,  we  expect 
the  connections  (1.1)  and  (1.2)  to  be  very  useful  in  analysing  the  zeros  of  F(—n ,  b ;  c;  z). 
Conversely,  if  the  zeros  of  F(—n,  6;  c;  z)  are  known,  this  leads  to  new  information  about 
the  zero  distribution  of  Gegenbauer  or  Jacobi  polynomials  for  values  of  their  parameters 
that  lie  outside  the  range  of  orthogonality  of  these  polynomials. 


This  paper  is  organized  as  follows.  In  Section  2  we  give  a  self-contained  review  of 
recent  results  regarding  the  zeros  of  several  special  classes  of  hypergeometric  polynomials. 
Section  3  contains  results  originally  due  to  Klein  [9]  which  detail  the  numbers  and 
location  of  real  zeros  of  F(— n,  6;  c;  z)  for  arbitrary  real  values  of  b  and  c.  We  provide 
simple  proofs  using  results  proved  in  [13]. 


2  Zeros  of  special  classes  of  hypergeometric  polynomials 

We  begin  with  a  few  general  remarks.  Since  we  shall  assume  throughout  our  discus¬ 
sion  that  b  and  c  are  real  parameters,  we  know  that  all  zeros  of  F(—n,  b\  c;  z)  must 
occur  in  complex  conjugate  pairs.  In  particular,  if  n  is  odd,  F  must  always  have  at 
least  one  real  zero.  Further,  if  b  =  —  m  where  m  <  n,  m  6  N,  F(-n,  b\  c;  z)  reduces 
to  a  polynomial  of  degree  m.  However,  since  we  are  interested  in  the  behaviour  of  the 
zeros  of  F(—n ,  6;  c;  z)  as  b  and/or  c  vary  through  real  values,  we  shall  adopt  the  con¬ 
vention  that  j F(— n,  — m;  c;  z)  —  lim&_>_m  F{— to,  6;  c;  z).  This  ensures  that  the  zeros  of 
F  vary  continuously  with  b  and  c.  Note  also  that  F(— n,  6;  c;  z)  is  not  defined  when 
c  =  0,-1,...,—  n  +  1.  Regarding  the  multiplicity  of  zeros,  a  hypergeometric  function 
w  =  F(a,  b;  c;  z)  satisfies  the  differential  equation 

z(  1  -  z)w ”  +  [c“(a  +  &+  1  )z\ wf  -  abw  =  0, 

so  if  w(zo)  =  w'(zo)  =  0  at  some  point  Zo  ^  0  or  1,  it  would  follow  that  w  =  0.  Thus 
multiple  zeros  of  F(—n ,  b\  c;  z)  can  only  occur  at  z  ~  0  or  1. 

2,1  Quadratic  transformations 

The  class  of  hypergeometric  polynomials  that  admit  a  quadratic  transformation  is  spe¬ 
cified  by  a  necessary  and  sufficient  condition  due  to  Kummer  (cf.  [1],  p.560).  There  are 
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twelve  polynomials  in  this  class  (cf.  [14],  p.124) 

F  (-n,  6;  26;  z)  F(-n,  6;  —n  -  6+  1;  z) 

F  (- n ,  6; 

F(-n,b ;  i;  z) 

F  (-n,  -n+  c;  z) 

F(-n,6; 

—n  *f  6  +  z ) 

F(-n,b;  f;  z) 

F(-n,»n- c;  z) 

F(-n,  b\ 

-77  +  6-  z) 

F  (~n,  6;  -2 n;  z) 

F  (-n,  6;  6  +  n  4*  1;  z) 

F  (-n,n  +  1;  c;  z)  . 

The  most  important  polynomial  in  this  class  is  F(-n,  6;  26;  z )  because  complete  analysis 
of  its  zero  distribution  for  all  real  values  of  b  (cf.  [4],  [5])  leads  to  corresponding  results 
for  the  zeros  of  the  Gegenbauer  polynomials  C*(z)  for  all  real  values  of  the  parameter 
A  (cf.  [6]). 

Theorem  2.1.  Let  F  =  F(-n,6;  26;  z)  where  b  is  real 

(i)  For  b  >  —  all  zeros  of  F(-n,  b;  26;  z)  are  simple  and  lie  on  the  circle  \z  —  l\  =  1. 

(ii)  For  -  j  <  b  <  |  -  j,  j  =  1, 2, . . .  [|]  -  1,  (n  -  2 j)  zeros  of  F  lie  on  the  circle 

\z  —  1|  =  1.  If  j  =  2k  is  even ,  there  are  k  non-real  zeros  of  F  in  each  of  the  four 

regions  bounded  by  the  circle  \z  —  1|  =  1  and  the  real  axis .  If  j  =  2k  +  1  is  odd, 

there  are  k  non-real  zeros  of  F  in  each  of  the  four  regions  described  above  and  the 
remaining  two  zeros  are  real. 

(Hi)  Ifn  is  even,  for  -  [f ]  <  6  <  —  [|]  +  no  zeros  of  F  lie  on  \z-l\  =  1.  Ifn  =  4k, 
all  zeros  of  F  are  non-real  whereas  ifn  =  4k  +  2,  two  zeros  of  F  are  real  and  4k  are 
non-real  Ifn  is  odd,  for  -1  -  [|  ]  <  &  <  —  [§]  +  only  the  fixed  real  zero  of  F  at 
z  =  2  lies  on  \z  —  1|  =  1.  Ifn  =  4k  + 1,  n  —  1  =  4k  zeros  of  F  are  non-real  whereas 
a/  n  =  4fc  +  3,  two  further  zeros  are  real  and  the  remaining  4k  are  non-real 

(iv)  For  j  -  n  <  6  <  j  -  n  +  1,  j  =  1, 2, . . .  [§]  -  1,  (n-2j)  zeros  of  F  are  real  and 
greater  than  1.  If  j  =  2k  is  even,  all  remaining  2  j  zeros  of  F  are  non-real  with 
k  zeros  in  each  of  the  regions  described  above;  while  if  j  =  2k  +  l,  4k  zeros  are 
non-real  as  before  and  2  are  real 

(v)  For  6  <  1  -  n,  all  zeros  of  F(—n,  6;  26;  z)  are  real  and  greater  than  1.  As  b  -+  —oo, 
all  the  zeros  of  F  converge  to  the  point  z  =  2. 

An  analogous  theorem  which  describes  the  behaviour  of  the  zeros  of  C*(z)  can  be 
found  in  [6],  Section  3  or  [7],  Theorem  1.2. 

For  the  polynomial  F  (— n,6;  z)  the  following  result  has  been  proved  in  [7],  The¬ 

orem  2.3. 

Theorem  2.2.  Let  F  =  F  (-n,  6;  |;  z)  with  6  real 

(i)  For  6  >  n  —  \,  alln  zeros  of  F  are  real  and  simple  and  lie  in  (0, 1). 

(ii)  For  n  -  \  -  j  <  6  <  n  +  \  -  j,  j  =  1, 2, . . . ,  n  -  1,  (n  -  j)  zeros  of  F  lie  in  (0, 1) 

and  the  remaining  j  zeros  of  F  form  [|]  non-real  complex  pairs  of  zeros  and  one 

real  zero  lying  in  (l,oo)  when  j  is  odd. 
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(in)  For  0  <  b  <  F  has  [^]  non-real  complex  conjugate  pairs  of  zeros  with  one  real 
zero  in  (l,oo)  when  n  is  odd. 

(iv)  For  ~j  <  b  <  —  j  +  1,  j  =  1, 2, . . . ,  n  —  1,  F  has  exactly  j  real  negative  zeros.  There 
is  exactly  one  further  real  zero  greater  than  1  only  when  (n  —  j)  is  odd  and  all  the 
remaining  zeros  of  F  are  non-real. 

(v)  For  b  <  1  —  n,  all  zeros  of  F  are  real  and  negative  and  converge  to  zero  asb  — ►  —  oo. 


A  very  similar  theorem  is  proved  for  the  zeros  of  F  (— n,  6;  |;  z)  in  [7],  Theorem  2.4 
with  only  minor  differences  of  detail. 

For  the  hypergeometric  polynomial  F(-n,  6;  —2 n;  z ),  less  complete  results  have  been 
proved.  We  have  (cf.  [8]  Theorem  3.1  and  Corollary  3.2)  the  following. 


Theorem  2.3.  Let  F  =  F  (— n,  6;  — 2n;  z)  with  b  real. 

(i)  For  b  >  0,  F  has  n  non-real  zeros  if  n  is  even  whereas  if  n  is  odd,  F  has  exactly 
one  real  negative  zero  and  the  remaining  (n  —  1)  zeros  of  F  are  all  non-real. 

(ii)  For  —n  <  b  <  0,  if  —k  <  b  <  —  k  +  1,  k  —  1, . . .  ,n,  F  has  k  real  zeros  in  the 
interval  (l,oo).  In  addition,  if(n-k)  is  even,  F  has  ( n  —  k )  non-real  zeros  whereas 
if  (n  —  k)  is  odd,  F  has  one  real  negative  zero  and  (n  —  k  —  1)  non-real  zeros. 

(Hi)  For  ~n>  b>  —2  n,  if  -n-k  >  b>  -n-k  —  1,  k  =  0, 1, . . .  ,rc  —  1,  F  has  (n  -  k) 
real  zeros  in  the  interval  (l,oo).  In  addition,  if  k  is  even  F  has  k  non-real  zeros 
while  if  k  is  odd,  F  has  one  real  zero  in  (0, 1)  and  (k  —  1)  non-real  zeros. 

(iv)  For  b  <  —2 n,  all  n  zeros  of  F  are  non-real  for  n  even  whereas  for  n  odd,  F  has 
exactly  one  real  zero  in  the  interval  (0,1). 


The  identities  (cf.  [7],  Lemma  2.1) 


F(-n,  b;  c;  1  -  z)  =  ^  F(-n,  6;  1  -  n  +  b  -  c;  z) 

{c)n 

F(—n ,  6;  c;  z)  =  Y~(-z)nF  ( —n,  1  -  c  —  n;  1  —  b  —  n\ 

\C)n  \  ZJ 


(2.1) 

(2.2) 


hold  for  b  and  c  real,  c  ^  {0,  — 1, . . . ,  —  n  +  1}.  Applying  (2.1)  and  (2.2)  to  each  of 
the  polynomials  F(—n,b\  26;  z),  F  (-n,6;  z),  F  (-n,6;  z)  and  F(-n,6;  -2 n;  z) 

in  turn,  we  obtain  the  remaining  eight  polynomials  in  the  quadratic  class.  It  is  then  an 
easy  task  to  deduce  analogous  results  for  their  zero  distribution. 

A  similar  set  of  results  has  been  proved  for  the  sixteen  hypergeometric  polynomials 
in  the  cubic  class.  Again,  this  class  arises  from  a  necessary  and  sufficient  condition  (cf. 
[2],  p.67)  and  details  can  be  found  in  [7]. 


3  The  real  zeros  of  F(—n:  b\  c;  z)  for  b  and  c  real 

The  results  proved  below  are  due  to  Klein  [9]  who  considered  the  zeros  of  more  general 
hypergeometric  functions  (not  necessarily  polynomials).  Klein’s  proof  is  geometric  and 
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difficult  to  penetrate.  A  more  transparent  perspective  in  the  polynomial  case  may  be 
provided  by  the  approach  given  here. 

The  classical  equation  linking  the  hypergeometric  polynomial  F(— n,  6;  c;  z)  with  Jac¬ 
obi  polynomials  vba,(*\z)  is  given  by  (1.2).  We  will  find  an  alternative  expression  (cf. 
[12],  p.464,  eqn.  (142)) 


F(-n,  6;  c;  z)  — 


n! £%(«,/»> 

(C)n  " 


(3.1) 


where  a  =  —n-b  and  /3  =  6-c-ra,  more  suited  to  our  analysis.  The  number  of  real  zeros 
of  vk*)(3\x)  in  the  intervals  (-1, 1),  (-oo,  1)  and  (l,oo)  are  given  by  the  Hilbert-Klein 
formulas  (cf.  [13],  p.145,  Theorem  6.72),  also  known  to  Stieltjes.  We  use  Klein’s  symbol 


(0  if  u  <  0 

[u]  if  u  >  0,  u  ^  integer 
u  —  1  if  u  =  1, 2,3, . . . 


Noting  that  under  the  linear  fractional  transformation  w  =  1  -  2/z,  the  intervals 
1  <  w  <  oo,  -oo  <  w  <  -1  and  -1  <  w  <  1  correspond  to  -oo  <  z  <  0,  0  <  z  <  1 
and  1  <  z  <  oo  respectively,  we  can  use  equation  (3.1)  to  rephrase  the  Hilbert-Klein 


formulas  for  hypergeometric  polynomials. 

Theorem  3.1.  Let  6,  c  €  R  with  6,  c,  c  —  b  ^  0,  — 1, . . . ,  -n  +  1.  Let 

X  =  sjifll-cMn  +  bl-lfe-c-nl  +  l)}  (3.2) 

Y  =  -E  {i  (— 11  -  c|  +  |n  +  6|  —  |6  —  c  —  n|  +  1))  (3.3) 

Z  =  E(^(-|l-c|-|n  +  b|  +  |6-c-n|  +  l)}.  (3.4) 

Then  the  numbers  of  zeros  of  F(-n,b;  c;  2)  in  the  intervals  (l,oo),  (0,1)  and  (-00, 0) 
respectively  are 

(  2[(X  +  l)/2]  if  (~l)n(-nb)  (b-c)  >0 
JVi  =  <  t  t  (3.5) 

l  2[X/2]  +  l  if(-l)n(-nb)(b-c)<  0 

r  2[(y+i)/2]  z/(-nc)C;c)>o 

N2  =  <  L  (3-0) 

l  2[y/2]  +  i  */(-nc)(6r)<° 

i/(")(")>0  (3.7) 

<  .  I  2[Z/2]  + 1  t/(;c)(;b)  <0. 
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Proof:  The  expressions  all  follow  immediately  from  the  Hilbert-Klein  formulas  (cf.  [13], 
p.145,  Thm.  6.72)  together  with  equation  (3.1).  □ 

Theorem  3.2.  Let  F  —  F(-n,  b ;  c;  z)  where  b>  c  €  R  and  c  >  0. 

(i)  For  b  >  c  +  n,  all  zeros  of  F  are  real  and  lie  in  the  interval  (0, 1). 

(ii)  For  c<b  <  cFriy  c  +  j  —  1  <  b  <  c  +  j,  j  =  1,2, . . .  ,n;  F  has  j  real  zeros  in  (0, 1). 
The  remaining  ( n  —  j )  zeros  of  F  are  all  non-real  if  (n  —  j)  is  even  while  if  (n  —  j ) 
is  odd,  F  has  (n  —  j  —  1)  non-real  zeros  and  one  additional  real  zero  in  (l,oo). 

(Hi)  For  0  <  b  <  c,  all  the  zeros  of  F  are  non-real  if  n  is  even,  while  if  n  is  odd,  F  has 
one  real  zero  in  (l,oo)  and  the  other  ( n  —  1)  zeros  are  non-real. 

(iv)  For  -n  <b  <0,  —  j  <  b  <  —  j  +  1,  j  =  1, 2, . . . ,  n,  F  has  j  real  negative  zeros.  The 
remaining  (n  —  j)  zeros  of  F  are  all  non-real  if  (n  —  j)  is  even,  while  if  (n  —  j)  is 
odd,  F  has  (n  —  j  —  1)  non-real  zeros  and  one  additional  real  zero  in  (l,oo). 

(v)  For  b  <  —n,  all  zeros  of  F  are  real  and  negative. 

Proof:  We  use  the  identity  (cf.  [1],  p.559,  (15.3.4)) 

F(-n,  6;  c;  z)  =  (1  -  z)nF  c  -b;c;  j+f)  (3.8) 

to  show  that  (i)  =>  (v)  and  (ii)  =>  (iv)  so  that  it  will  suffice  to  prove  (i),  (ii)  and  (iii) 
above. 

(i)  =4>  (v):  If  b  <  —n  then  c  —  b  >  c  +  n  and  by  (i),  all  zeros  of  F(—n ,  c  —  b\  c;  w)  are 
real  and  lie  in  the  interval  (0, 1).  Since  w  =  zj(z  —  1)  maps  (-oo,  0)  to  (0, 1),  (v)  follows 
from  (3.8). 

(ii)  =>  (iv):  If  -j  <  b  <  -j  +  1,  j  =  1, 2, . . . ,  n,  then  c  +  j  -  1  <  c-b  <  c  +  j , 
j  =  1,2, . . .  ,n.  By  (ii),  since  w  =  z/(z  —  1)  maps  (— oo,0)  to  (0, 1)  and  (1,  oo)  to  (1,  oo), 
(iv)  follows  again  from  (3.8).  To  prove  (i),  (ii)  and  (iii),  we  note  that  in  each  part,  b  >  0 
(and  of  course  c  >  0  by  assumption).  Then 

sign  (“6)  =  (-1)",  sign  (“')  =  (-1)".  (3.9) 


(i)  Suppose  b>  c  +  n.  Then  b-  c>  n  and 


>  0  for  all  n. 


(3.10) 


Considering  (3.5),  (3.6)  and  (3.7)  with  (3.9)  and  (3.10),  we  observe  that 
Ni  =  2  [(X  +  l)/2] ,  JVa  =  2[(Z  +  l)/2], 


(  2  [(F  +  l)/2]  for  n  even 
AT2  =  <  . 

[  2  [Y/2]  +  1  for  n  odd 

Assume  now  that  c  >  1.  Then  for  b  >  c  +  n,  we  have  from  (3.2),  (3.3)  and  (3.4) 
that  X  =  0,  Y  —  n,Z  =  0.  Substituting  these  values  into  IVj,  1V2  and  Ns  yields  the 
result.  A  similar  calculation  shows  that  the  same  result  is  obtained  when  0  <  c  <  1. 
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(ii)  For  c  +  j  -  1  <  b  <  c  +  j7  j  =  1, 2, . . . , n,  we  find  that  sign  (6“c)  =  (~l)n“-?.  Then 
from  (3.5),  (3.6),  (3.7)  we  see  that 

2  [(X  +  l)/2]  for  (n  —  j)  even 

2  [AT/ 2]  +  1  for  (n  -  j)  odd 

2  [(Y  +  l)/2]  for  j  even 
2  [Y/2]  +  1  for  j  odd 

N3  =  2[(Z  +  l)/2]. 

It  follows  from  (3.2),  (3.3)  and  (3.4)  by  an  easy  calculation  that  X  =  0,  y  =  j, 
Z  =  0  and  we  deduce  that  ATj  =  |  ^  j™  _  j  j  ,  N2  =  j  and  N3  =  0 

which  proves  (ii). 

(iii)  For  0  <  b  <  c,  sign  (6'c)  =  (~l)n 

iV2  =  2  [(y  +  l)/2],  N3  =  2  [(Z  +  l)/2].  Also,  we  find  X  =  0,  Y  =  0  and  Z  —  0 
which  completes  the  proof  of  (iii)  and  hence  the  theorem.  □ 

For  c  <  0,  the  range  of  values  of  b  and  c  that  have  to  be  considered  can  be  reduced 
if  we  use  the  identities  (2.1)  and  (2.2).  Since  the  real  zeros  of  F(— n,  6;  c;  z)  are  now 
known  for  all  c  >  0  and  b  e  R  from  Theorem  3.2,  it  follows  from  (2.1)  that  wre  need  only 
consider  c  -  b  >  1  -  n.  Similarly,  from  (2.2)  and  Theorem  3.2,  we  can  assume  6  >  1  -  n. 
We  split  the  result  for  c  <  0  into  the  cases  where  b  >  0  and  1  —  n  <  b  <  0. 


.  Then  Nx 


(  2  \(X  +  l)/2]  if  n  is  even 
“  [  2  [X/2]  -hi  if  n  is  odd 


Theorem  3.3.  Let  F  =  F(~n,  6;  c;  z).  Suppose  that  c  <0,  b  >  0,  c  —  b  >  1  —  n.  Then 

(i)  1  —  n  <  c  —  b  <0  and  0  <  b  <  n  —  1  and  1  —  n  <  c  <  0. 

(ii)  If  -k  <  c  <  ~k  +  1,  k  =  1, . . . ,  n  —  1  and 

-j  <c-b<  -j  +  1,  j  =  1,  —  ,12—1, 

then  F(-n, 6;  c;  2)  (j  —  k)  >  0  real  zeros  in  (0, 1) .  For  the  remaining  (n-j- b k ) 

zeros  of  F 

(a)  (n  —  j  +  k)  are  non-real  if  (n  —  j)  and  k  are  even 

(b)  (n  —  j  +  k  —  1)  are  non-real  and  one  real  zero  lies  in  (l,oo)  if  ( n  —  j)  is  odd 
and  k  is  even 

(c)  (n  ~  j  +  k  —  1)  are  non-real  if  ( n  -  j)  is  even,  k  odd  and  one  zero  is  real  and 
negative 

(d)  (n  -  j  +  k  —  2)  are  non-real  if  (n-j)  is  odd  and  k  is  odd  with  one  real  negative 
zero  and  one  real  zero  in  ( l,oo). 


Proof:  (i)  This  follows  immediately  from  c<0,  b>0,  c-b>  1  —  n. 
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(ii)  For  c  <  0,  b  >  0,  c  -  b  >  1  -  n,  we  have 

|1  —  c|  =  1  —  c,  |&-fn|  =  fr  +  rc,  \b  —  c  -  n\  =  c  -  b  +  n 
and  it  follows  from  (3.2),  (3.3)  and  (3.4)  that 

X  =  E(l-c-n),  Y  =  E(b ),  Z.=  E{c-b). 

Since  1  -  c  -  n  <  0  and  c  -  6  <  0,  X  =  Z  =  0.  Now  sign  (“6)  =  (-l)n  and  for 
k  =  1, . . . ,n  -  1,  -fc  <  c  <  -/c  H- 1  sign  (“c)  =  (-l)n~fc,  while  for  -j  <c-b  < 
-j  +  1,  j  =  -  1,  sign(b“c)  =  (-l)n~j.  Therefore,  from  (3.5),  (3.6)  and 

(3-7), 

<3I2) 


Now  for  j  >  b  —  c  >  j  —  1  and  —  k  <  c  <  —  k  4-  1,  b  e  (j  —  k  —  1,  j  —  k  +  1), 
j  -  k  =  1,2,.. .  ,n  -  2.  If  b  £  (j  -  k  -  1,  j  —  fc),  Y  =  E(b)  =  j  -  k  -  1,  whereas  if 
b  e  (j—k,  j  —  fc+1),  y  =  £■(?>)  =  j-k.  Considering  the  cases  (j-k)  even  and  ( j-k ) 
odd,  it  is  straight-forward  to  check  that  for  all  j,  fc  €  N  with  j  —  fc '  =  0, 1, . . . ,  n  —  2, 
we  have 

N2=j-k.  (3.14) 

Equations  (3.11),  (3.12),  (3.13)  and  (3.14)  complete  the  proof  of  (ii).  a 

By  virtue  of  Theorem  3.3  and  the  identities  (2.1),  (2.2)  and  (3.8),  it  is  easy  to  see 
that  we  only  have  one  possibility  left  that  has  not  been  analysed,  namely, 

1  — n<c  — 6<0,  l-n<6<0,  l-n<c<0.  (3.15) 


Theorem  3.4.  Let  F  =  F(— n,  6;  c;  z)  where  b  and  c  satisfy  condition  (3.15).  If  —j  < 
b  <  —  j  + 1,  j  —  1, . . .  ,n  —  1;  —  k  <  c  <  —  k  +  l,  k  =  1, . . .  ,n  —  1  and  —£<  c—b  <  —£+ 1, 
£  =  1, . . . ,  n  —  1,  then  F  has  no  real  zeros  if  n  +  j  +  £,  k  +  £,  j-\-k  are  even,  one  real  zero 
in  (1,  oo)  if  n  +  j  +  i  is  odd,  one  real  zero  in  (0, 1)  if  k  +  £  is  odd  and  one  real  negative 
zero  if  j  +  k  is  odd. 

Proof:  Under  the  restrictions  (3.15),  we  have 

1 1  —  c|  =  1  —  c,  |&  +  n|=6  +  n,  \b  — c  —  n\  — c  —  b  +  n. 

Then  from  (3.2),  (3.3)  and  (3.4), 

X  =  E(l-c-n),  Y  =  E(b),  Z  =  E(c-b ), 
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and  it  follows  from  (3.15)  that  X  —  Y  =  Z  =  0.  Also,  sign  ( nb)  =  sign  ( nc)  = 

(_l)n-fc  anc|  sign(b“c)  =  (— The  stated  result  then  follows  immediately  from 
(3.5),  (3.6)  and  (3.7).  □ 

Remark  3.1  We  have  not  considered  the  asymptotic  zero  distribution  as  n  — ►  oo 
of  F(— n,  6;  c;  z).  There  are  recent  interesting  results  in  this  regard  using  different  ap¬ 
proaches,  namely  complex  analysis  techniques  [10],  matrix  theoretic  tools  [11],  asymptotic 
analysis  of  the  Euler  integral  representation  [3]  and  analysis  of  coefficients  [8]. 
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Abstract 

In  order  to  analyze  the  accuracy  of  a  fixed,  finite-dimensional  approximation  space  which 
is  not  uniform  over  its  domain  Q,  we  define  approximation  error  map ,  a  description  of 
how  the  error  is  distributed  over  — not  for  a  single  test  function  but  for  a  general  class 
of  such  functions.  We  show  how  to  compute  such  a  map  from  the  best  approximations 
to  an  orthonormal  basis  of  the  target  function  space. 

1  Introduction 

The  expected  accuracy  of  a  finite-dimensional  approximation  space  (e.g.  a  polynomial 
spline  space,  or  a  finite  wavelet  decomposition)  will  often  vary  over  its  domain  IT  Indeed, 
adaptive-resolution  schemes  are  based  on  the  premise  that  refining  the  element  grid  in 
a  particular  region  of  Q  will  improve  the  approximation  accuracy  in  that  region. 

Knowledge  of  how  the  expected  approximation  error  varies  over  the  domain  Q  is 
obviously  relevant  to  the  evaluation  of  an  approximation  space,  and  to  the  tuning  of  knot 
locations,  grid  geometry,  refinement  thresholds  and  other  parameters.  Towards  that  goal, 
we  introduce  the  concept  of  approximation  error  map ,  a  description  of  how  the  error  is 
distributed  over  0 — not  for  a  single  test  function,  but  for  all  functions  in  some  specified 
space  T .  We  then  show  how  to  compute  such  a  map  from  the  best  approximations  to 
an  orthonormal  basis  of  T. 

1.1  Notation  and  definitions 

Let  T  and  A  be  two  fixed,  finite-dimensional  vector  spaces,  not  necessarily  disjoint,  of 
functions  defined  on  some  domain  Q  with  values  in  R.  Let  jj*|j  be  a  vector  semi-norm 
for  the  space  A  +  T.  For  any  function  f  €  J7,  we  define  its  best  approximation  as  the 
function  fAeA  that  minimizes  the  error  ||/  —  fA\\. 

We  refer  to  A  and  T  as  the  approximation  and  gauge  spaces ,  respectively.  We  assume 
that  the  ||-||-balls  in  the  subspace  A  are  strictly  convex,  ensuring  that  the  best  approx¬ 
imation  always  exists  and  is  unique.  Since  (af)A  =  a(fA)  and  ||a/||  =  |a|  ||/||  for  any 
real  constant  a,  we  can  confine  the  analysis  of  approximation  errors  to  the  unit  T -sphere 
fi  =  {fef:  ||/||  =  1}. 

1.2  Global  error  measures 

Usually,  the  effectiveness  of  the  approximation  space  A  is  measured  by  a  single  number 
|| /  ~  I'* ||  either  for  the  worst-case  function  /  E  or  by  the  root-mean-power  average 
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over  all  functions  /Gfi 

r  r  -i  !/p  /  r  r  ii/p 

<A,T=[jJf-fAWPdf\  /[j^d!  . 

Note  that  integrals  are  taken  over  the  function  space  T\,  not  over  the  domain  0. 

The  worst-case  error  is  the  limit 

=  sup{||/-/-4||  :  /6f] i}. 

1.3  Uniform  approximation  spaces 

A  global  error  measure  such  as  t  or  a*  A  :f  is  generally  sufficient  when  all  points  of  Cl 
are  equivalent  with  respect  to  the  quality  of  approximation.  More  formally,  we  say  that 
a  normed  function  space  X  is  uniform  over  Cl  if  there  is  some  family  $  of  maps  from  Cl 
to  Q  that  preserves  X  and  its  norm  ||-||,  and  which  can  take  any  point  of  Cl  to  any  other 
point.  A  natural  example  is  the  set  of  all  harmonic  functions  on  the  sphere  Sd  of  a 
given  maximum  order  n,  with  any  Lp  norm;  this  space  is  preserved  by  the  family  of  rigid 
rotations  of  Sd.  Obviously,  if  both  A  and  T  are  uniform  under  the  same  family  $,  then 
A  approximates  T  equally  well  at  all  points  of  Cl.  (Of  course,  for  any  specific  function 
/  G  T,  the  error  /  —  fA  will  usually  vary  over  Cl.) 

There  are  however  many  important  approximation  spaces  A  which  are  not  uniform.  A 
familiar  example  is  the  space  of  polynomials  or  trigonometric  series  defined  on  a  bounded 
region  Cl  C  R71.  Another  example  is  the  space  of  the  piecewise  polynomial  splines  of  fixed 
order  and  continuity  defined  over  a  fixed  grid  G.  Wavelet  spaces  truncated  to  a  fixed 
order  provide  yet  another  example.  For  such  spaces,  the  expected  approximation  error 
usually  varies  over  Cl,  even  when  the  functions  to  be  approximated  are  drawn  from  a 
uniform  space. 

2  Approximation  error  map 

We  define  the  root  mean  power  approximation  error  map  of  T  by  A  as  the  function 
vp,a,t  of  H  to  R  defined  by 


As  before,  integrals  are  taken  over  the  function  space  T\,  not  over  the  domain  Cl.  Note 
that  ctPz  a,f(x)  is  not  the  error  for  a  specific  function  /,  but  rather  the  average  error  at  the 
point  x  for  a  generic  function  /  in  T\.  As  a  limiting  case,  we  define  also  the  worst- case 
approximation  error  map  of  T  by  A  as  the  function 

VaAx)  =  vp,a,Ax)  =  sup  {  \f(x)  -  fA{x)\  :  /  €  T\  }  .  (2.2) 

Again  note  that  the  supremum  is  taken  over  T\,  not  over  0,  and  that  is  not 

the  error  at  x  for  a  single  function  /,  but  rather  the  error  for  the  function  /  in  that 
is  worst  for  that  particular  x.  A  plot  of  crViA>T{ x)  or  over  ^  should  show  at  a 


(1.1) 

(1.2) 
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glance  how  well  A  approximates  T  in  different  parts  of  the  domain,  for  all  functions  of 
T  at  once. 


3  Computing  the  approximation  error  map 

Formulas  (2.1)-~(2.2)  become  more  tractable  when  the  function  metric  ||-||  is  the  L 2  norm 
ll/ll  =  [Jn  |/(x)|2  dx]1/2  defined  on  the  space  A-\-T — in  other  words,  when  ||/||2  =  (/,  /) 
where  {/,  g)  —  Jn  f(x)g(x)  dx.  We  make  this  assumption  in  the  remainder  of  this  section. 
In  that  case,  fA  is  a  linear  function  of  /,  namely  the  orthogonal  projection  of  /  onto  the 
subspace  A\  and  fiA,T  is  simply  |sin#|,  where  0  is  the  angle  between  the  two  subspaces. 

3.1  Explicit  formula  for  a 

Let  ,us  suppose  that  A  and  T  are  disjoint,  and  let  </>i, . . . ,  (j)n  be  an  orthonormal  basis  for 
T.  Let  cti  =  (j)f  for  all  z,  and  let  e\  —  fa  —  a*.  We  will  call  <£,  a,  and  e  the  gauge ,  approx- 
imation ,  and  error  bases ,  respectively  (even  though  a*  and  need  not  be  independent). 
The  average  error  map  <jv\a ,f{x)  can  he  expressed  in  terms  of  the  error  basis 


i,Ax)  =  Js  t  (x)  - 

r  1  r  P  i1/p 

=  \Tn  LXCi£i[x)  dc  ’ 


(x)\  dc 


IL 


where  An  —  2zrt  /F(^)  is  the  measure  of  Sn  x. 

Note  that  J2ici£i(x)  *s  the  dot  product  of  the  unit  vector  c  —  (ci,  C2, . . . ,  cn)  and 
the  vector  e(a;)  =  (ei(x),£2(^), « •  •  it  depends  only  on  |e(a;)|  and  on  the  angle  0 

between  those  two  vectors,  and  is  constant  over  the  slice  of  S71-1  where  0  is  constant. 
The  measure  of  that  slice  is  An-i  Isin#!™”1  dO.  Therefore, 

[1  /*tt  *1 1  Ip 

TJq  leWllcoseiM^lsinC"1^ 

[A  fTT  1  1  /P 

. J"-  /  |cos0|p  Isin^p-1  dO 

An  Jo 


= 


^r(^)r(2±p; 


3.2  Explicit  formula  for  p 

The  worst-case  error  map  (Ia,f  can  be  obtained  by  taking  p  to  the  limit  +00  in  for¬ 
mula  (3.2),  or  directly,  as  follows.  From  formula  (2.2), 

VA,?(x)  =  supj  |  (*) “  (x)  :  =  1) 
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=  supj  \j2cMx)  :  c  G  Sn  1|.  (3.3) 

By  considering  the  effect  of  negating  each  Gj,  it  is  easy  to  see  that  the  absolute  value  in 
the  last  formula  is  superfluous,  i.e. 

HaAx)  =  supj  ^2°i£i(x)  :  c  £  Sn_1  ).  (3.4) 

Formula  (3.4)  is  the  supremum  of  a  linear  functional  with  coefficients  £i(x)  over  the 
sphere  Sn_1;  which  is  achieved  at  the  point  c*(x)  of  Sn_1  that  is  collinear  with  the 

coefficient  vector,  namely  c*(x)  =  Si(x) / ^Ylj{eAx))2 >  whence 

'HA  As)  =  ^4(x)ei(a:)  =  ^TfoOs))2  =  |e(a:)|.  (3.5) 

In  summary,  the  error  maps  <TPiA,r(%)  and  ji^^(x)  (which  differ  only  by  a  constant 
factor)  can  be  derived  from  the  approximation  errors  £i(x)  for  each  basis  function  (f>i(x), 
combined  with  the  norm  |e(ar)|  =  y/^2i{si{x))2 . 

4  Practical  considerations 

4.1  Connection  between  the  function  and  point  norms 

The  maps  (2.2)  and  (2.1)  will  be  more  useful  when  there  is  a  direct  connection  between 
the  function-space  norm  ||-||  and  the  absolute  value  |-|,  used  to  compare  functions  values 
at  a  given  point  x,  as  in  formulas  (2.1)-(2.2) — namely,  when 

C  1 1/9 

ll/ll  =  /  \f(x)\q  dx  ,  (4.1) 

Jfl  . 

More  generally,  the  function  values  at  x  could  be  compared  with  a  norm  which  could 
depend  on  2,  or  take  derivatives  of  the  function  into  account.  We  will  not  pursue  such 
extensions  in  this  paper. 

Connection  (4.1)  is  not  strictly  necessary — at  least  when  A  and  T  are  finite  dimen¬ 
sional.  However,  it  may  not  make  much  sense  to  choose  the  approximant  fA  so  as  to 
minimize  the  function  norm  ||-||,  and  then  analyze  its  accuracy  using  some  other  norm 
| •  I,  if  there  is  no  connection  between  the  two. 

Considering  that  the  error  map  is  relatively  easy  to  compute  when  ||*||  is  the  L2 
norm  (see  Section  3),  and  probably  intractable  otherwise,  the  connection  expressed  by 
formula  (4.1)  will  probably  hold  in  practice  (with  q  =  2). 

4.2  Choice  of  the  gauge  space 

The  approximation  error  map  depends  not  only  on  the  space  A,  but  also  on  the  gauge 
space  T  and  the  error  metric  ||/||.  Therefore,  the  choice  of  T  and  ||-||  must  be  guided  by 
the  intended  application. 

For  example,  suppose  the  domain  ft  is  the  circle  or  the  sphere  Sd,  and  the  application 
does  not  specify  a  preferred  direction.  Then  we  should  choose  T  and  ||-||  so  that  they 
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are  invariant  under  rotations  of  ft — otherwise,  any  inhomogeneity  in  them  may  produce 
irrelevant  artifacts  in  the  error  map.  Also,  if  the  functions  to  be  approximated  are  expec¬ 
ted  to  be  smooth,  and/or  only  their  low  frequencies  are  important,  then  the  functions 
in  T  should  be  smooth  too.  A  natural  choice  for  T ,  in  this  case,  are  the  circular  or 
spherical  harmonics  up  to  a  certain  maximum  order,  and  the  metric  ||*||  can  be  simply 
the  Lg  norm  over  the  sphere  Srf. 

4.3  Essential  dimensions 

We  will  argue  next  that,  for  the  X2  function  norm,  the  “interesting”  part  of  the  error 
map  is  determined  by  two  “essential”  subspaces  T*  QT  and  A!  C  A ,  which  are  disjoint 
and  such  that  dim.?7'  >  dim  A7. 

First,  if  the  spaces  A  and  T  have  a  non- trivial  intersection  V,  and  we  split  a  function 
/  e  T  into  its  components  g  €  V  and  h  _L  V,  we  find  that  fA  =  g  +  hA\  and  that  hA 
is  itself  orthogonal  to  V.  Therefore,  we  can  confine  our  attention  to  the  complements  T* 
and  A !  of  V  relative  to  A  and  which  are  disjoint. 

Let  us  then  suppose  that  A  and  T  are  disjoint.  If  dimX*  <  dim  A,  let  A'  C  Abe  the 
projection  of  T  onto  A,  which  contains  all  optimum  approximants.  Obviously,  for  any 
function  /,  we  have  fA  =  fA,  so  we  can  confine  our  attention  to  the  space  *4',  which  is 
still  disjoint  from  T  and  satisfies  dimX  >  dim  A'. 

5  Examples 

5.1  Trigonometric  splines  on  the  circle 

Consider  the  approximation  of  a  function  by  continuous  trigonometric  splines,  of  max¬ 
imum  frequency  r  —  2,  defined  on  a  partition  T  of  S1  into  n  =  8  unequal  intervals.  This 
space  coincides  with  the  space  Vq2[T]  of  non-homogeneous  polynomial  splines  of  R2, 
restricted  to  S1,  with  Co  continuity  constraints  [2]. 

For  the  gauge  space  T,  we  will  use  the  family  of  trigonometric  series  truncated  after 
a  suitable  maximum  frequency  s  >  r,  which  coincides  with  the  space  of  general  spherical 
polynomials  (not  splines)  Vs ,2  for  some  s  >  r.  The  norm  is  ||/||  =  \JTJ7F)  where 
(f,g)  =  /Si  f(0)g(0)  dp.  Specifically,  T  consists  of  the  intervals  Jo  through  J7  shown 
below 


to  Jo  t\  h  t2  h  ^3  h  ^4  h  h  h  ^6  h  h  J7  t$ 


Within  each  interval  Jj,  the  generic  approximant  is  a  linear  combination  gj  of  the  Fourier 
basis  functions  <j>i\  for  —  r  <  i  <  +r.  These  partial  functions  are  constrained  to  be 
continuous  across  interval  boundaries;  i.e.  g,j-\{tj)  =  gj{tj)  for  each  j  in  {0, ...  ,n  -  1} 
(where  ail  indices  are  taken  modulo  n).  These  equations  turn  out  to  be  independent, 
therefore  the  dimension  of  A  is  n(2r  +  1)  —  n  =  32. 

For  the  gauge  space  T ,  we  will  use  the  trigonometric  polynomials  of  some  order 
s  >  r,  i.e.  linear  combinations  of  the  basis  functions  fa  for  -s  <  i  <  +s,  where  fa{0)  = 
(1/V5F)  sin(^+7r/4).  As  observed  in  Section  4.3,  we  can  ignore  the  subspace  A'  =  Tf\A  of 
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A  generated  by  0_r, . . . ,  <j>r.  Moreover,  in  order  to  use  all  of  A,  we  need  dim  T  >  dim  A — 
i.e.,  2s +1  >  32,  implying  s  >  16.  See  Figure  1.  The  resulting  error  map  is  shown 

in  Figure  2. 


Fig.  1.  The  functions  </>*(£),  a*(£),  and  £i(t),  for  selected  values  of  i. 


Fig.  2.  The  error  map  /x^jr{t)  for  continuous  (Co)  trigonometric  splines  on  eight  un¬ 
equal  intervals,  tested  with  the  space  of  trigonometric  polynomials  of  order  16. 

5.2  Spherical  splines  on  a  uniform  mesh 

For  the  examples  in  this  section,  the  approximating  functions  are  spherical  polynomial 
splines  [1,  2,  3,  4]  of  continuity  class  zero  and  various  degrees,  homogeneous  and  non- 
homogeneous,  defined  on  some  triangulation  T  of  the  sphere  S2. 

Figure  3  (left)  shows  the  approximation  error  map  fJ>A,r(p)  for  the  homogeneous 
spherical  spline  space  A  =  Hq[T]/S2,  which  has  dimension  252.  In  Figure  3  (right),  A 
is  the  non-homogeneous  spherical  spline  space  Vq[T]/ S2,  which  has  dimension  254.  In 
both  cases,  the  gauge  space  T  is  the  family  3^15  of  spherical  harmonics  of  maximum  order 
15,  which  has  dimension  256.  The  intersection  T  fi  7^q[T]/S2  is  the  family  of  spherical 
harmonics  of  odd  order  <  5  (dimension  21),  whereas  T  PI  Vq  [T]/S2  is  the  full  harmonic 
space  y%  (dimension  25).  The  level  curves  are  logarithmically  spaced,  five  per  decade. 
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Fig.  3.  Error  maps  for  the  approximation  spaces  A  =  TCq[T]/S2  (left)  and 

A  =  Vq[T]/S2  (right).  The  maximum  errors  are  13.5  and  9.37,  respectively. 

5.3  Spherical  splines  on  a  variable  mesh 

In  the  following  examples,  the  approximating  functions  are  again  spherical  polynomial 
splines,  but  the  vertices  of  the  triangulation  T  have  been  displaced  so  as  to  create  regions 
of  very  different  sizes  (still  with  icosahedral  topology). 

Figure  4  (left)  shows  the  approximation  error  map  fiA,r(p)  for  the  space  of  homogen¬ 
eous  spherical  splines  A  =  Hq[T]/S2,  which  has  dimension  252.  In  Figure  4  (right),  A 
is  the  space  of  non-homogeneous  spherical  splines  Vq[T']/ S2,  which  has  dimension  254. 
In  both  cases,  the  gauge  space  T  is  the  family  y2h  of  spherical  harmonics  of  maximum 
order  15,  which  has  dimension  256,  as  before.  The  level  curves  are  logarithmically  spaced 
(5  per  decade). 

6  Conclusion 

Asymptotic  error  analysis  is  not  very  helpful  when  comparing  two  fixed  finite-dimensional 
approximation  spaces  of  similar  dimensions — such  as  a  spline  space  against  a  wavelet 
space,  or  two  spline  spaces  with  different  grid  geometries.  Approximation  errors  com¬ 
puted  for  individual  test  functions  are  difficult  to  interpret  and  may  not  be  representative 
of  the  average  or  worst  cases.  We  expect  that  the  approximation  error  map  will  be  a  use¬ 
ful  analysis  tool  for  those  situations — especially  for  domains  that  admit  natural  uniform 
target  spaces,  such  as  spheres  (including  the  circle)  and  tori. 
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Fig.  4.  Error  maps  Paf(p)  for  the  approximation  spaces  A  =  Hq[T]/S2  (left)  and 
A  =  V$[T]/S2  (right).  The  maximum  errors  are  17.1  and  17.9,  respectively. 
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1  Introduction 

The  classical  perceptron  proposed  by  Rosenblatt  [22]  as  a  simplified  model  of  a  neuron  computes  a 
weighted  sum  of  its  inputs  and  after  comparing  it  with  a  threshold,  applies  an  activation  function 
representing  a  rate  of  neuron  firing.  To  model  this  rate,  Rosenblatt  used  the  Heaviside  discontinu¬ 
ous  threshold  function,  which  still  is,  together  with  its  various  continuous  approximations,  the  most 
widespread  type  of  activation  used  in  neurocomputing.  Formally,  a  perceptron  with  the  Heaviside 
activation  function  computes  a  characteristic  function  of  a  half-space  of  7£rf,  which  is  for  practical 
reasons  (all  inputs  are  bounded)  restricted  to  a  box,  usually  [0,  l]d .  Thus  theoretical  study  of  per¬ 
ceptron  networks  leads  to  various  questions  concerning  approximation  of  functions  by  a  special  class 
of  plane  waves  formed  by  linear  combinations  of  characteristic  functions  of  half-spaces  (correspond¬ 
ing  to  the  simplest  model  of  perceptron  network  called  the  one-hidden-layer  network  with  a  linear 
output  unit). 

Although  Rosenblatt’s  model  was  inspired  biologically,  plane  waves  (sometimes  called  ridge  func¬ 
tions)  have  been  studied  for  a  long  time  by  mathematicians  motivated  by  various  problems  from 
physics.  In  contrast  to  integration  theory,  where  functions  are  approximated  by  linear  combinations 
of  characteristic  functions  of  boxes  (simple  functions),  the  theory  of  perceptron  networks  studies 
approximation  of  multivariable  functions  by  linear  combinations  of  characteristic  functions  of  half¬ 
spaces.  Expressions  in  terms  of  such  functions  exhibit  the  strength  and  weakness  of  plane  waves 
methods  described  by  Courant  and  Hilbert  [4],  page  676:  “But  always  the  use  of  plane  waves  fails  to 
exhibit  clearly  the  domains  of  dependence  and  the  role  of  characteristics.  This  shortcoming,  however, 
is  compensated  by  the  elegance  of  explicit  results.” 

In  this  paper  we  survey  our  recent  results  on  properties  of  approximation  by  linear  combinations 
of  characteristic  functions  of  half-spaces.  We  focus  on  existence  of  best  approximation,  impossibility 
of  choosing  among  best  approximations  a  continuous  one,  estimates  of  rates  of  approximation  by 
linear  combinations  of  n  characteristic  functions  of  half-spaces  and  integral  representation  as  a  linear 
combination  of  a  continuum  of  half-spaces. 


This  work  was  partially  supported  by  GA  CR  201/99/0092  and  201/02/0428. 
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2  Preliminaries 

A  perceptron  with  an  activation  function  ij)  :  1Z  — ►  1Z  (where  1Z  denotes  the  set  of  real  numbers) 
computes  real- valued  functions  on  1Zd  x  Hd+l  of  the  form  ^(v  •  x  +  5),  where  x  £  lZd  is  an  input 
vector,  v  £  lZd  is  an  input  weight  vector  and  b  £  1Z  is  a  bias. 

The  most  common  activation  functions  are  sigmoidals,  i.e.,  functions  with  an  ess-shaped  graph. 
Both  continuous  and  discontinuous  sigmoidals  are  used.  Here,  we  study  networks  based  on  the 
discontinuous  Heaviside  function d  defined  by  d(£)  —  0  for  t  <  0  and  d( t )  =  1  for  t  >  0.  Let  Hd 
denote  the  set  of  functions  on  [0,  l]d  computable  by  Heaviside  perceptrons,  i.e., 

Hd  =  {/  :  [0,l]d  -»>  7Z\f(x)  =  0(v  •  x  +  6),v  £  1Zd,b  £  1Z}. 

Notice  that  Hd  is  the  set  of  characteristic  functions  of  half-spaces  of  lZd  restricted  to  [0,  l]d. 

For  all  positive  integers  d,  Hd  is  compact  in  (£p([ 0,  l]d),  ||.||p)  withp  £  [l,oo)  (see,  e.g.,  [8]).  This 
can  be  verified  easily  once  the  set  Hd  is  reparameterized  by  elements  of  the  unit  sphere  Sd  m  TZd+\ 
Indeed,  a  function  d(v  •  x  +  b),  with  a  non-zero  vector  (iq, . . . ,  vd,  b)  £  7 Zd+1,  is  equal  to  d(v  ♦  x-f-  6), 
where  (fti, . . .  ,vd,  b)  £  Sd  is  obtained  from  (vi, . . .  ,u<*,  b)  £  lZd+l  by  normalization. 

The  simplest  type  of  multilayer  feedforward  network  has  one  hidden  layer  and  one  linear  output. 
Such  networks  with  Heaviside  perceptrons  in  the  hidden  layer  compute  functions  of  the  form 

n 

-x+6), 
i= 1 

where  n  is  the  number  of  hidden  units,  Wi  £  1Z  are  output  weights  and  v/£  lZd  and  bi  €lZ  are  input 
weights  and  biases,  respectively.  The  set  of  all  such  functions  is  the  set  of  all  linear  combinations  of 
n  elements  of  Hd  and  is  denoted  by  span nHd- 

For  all  positive  integers  d,  Un€^4spannH'd  (where  A/+  denotes  the  set  of  all  positive  integers)  is 
dense  in  (C([0,  l]d),  ||.||c),  the  linear  space  of  all  continuous  functions  on  [0,  l]d  with  the  supremum 
norm,  as  well  as  in  (£p([0,  l]d),  ||.||p)  with  p  £  [l,oo]  (see,  e.g.,  [5,  9]). 

3  Existence  of  a  best  approximation 

A  subset  M  of  a  normed  linear  space  (X,  ||.||)  is  called  proximinal  if  for  every  /  £  X  the  distance 
||/  -  M ||  =  inf5GM  ||/  -  g ||  is  achieved  for  some  element  of  M,  i.e.,  ||/  -  M\\  —  mingeM  \\f  —  g\\  (see, 
e.g.,  [23]).  Clearly,  a  proximinal  subset  must  be  closed. 

A  sufficient  condition  for  proximinality  of  a  subset  M  of  a  normed  linear  space  (X,  ||.||)  is 
compactness  or  bounded  compactness.  However,  by  extending  Hd  into  span^id^  for  any  positive 
integer  n  we  lose  compactness.  Nevertheless  compactness  can  be  replaced  by  a  weaker  property 
that  requires  only  those  sequences  that  “minimize”  a  distance  from  M  of  an  element  of  X  to  have 
convergent  subsequences.  More  precisely,  a  subset  M  of  a  normed  linear  space  (X,  ||.||)  is  called 
approximatively  compact  if  for  each  /  £  X  and  any  sequence  {gi  :  i  £  A/+}  C  M  such  that 
limi^oo  ||/  -  gi\\  —  ||/  -  M ||,  there  exists  g  £  M  such  that  {gi  :  i  £  A/+}  converges  subsequentially 
to  g  (see,  e.g.,  [23],  p.  368).  The  following  theorem  is  from  [16]. 

Theorem  3.1  For  all  n,  d  positive  integers,  span nHd  is  an  approximatively  compact  subset  of 
(£p([0,  l]d,  ||- ||p)  with  p  e  [1,  oo). 

The  proof  is  based  on  an  argument  showing  that  any  sequence  of  elements  of  spannid^  has  a 
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subsequence  that  either  converges  to  an  element  of  span nHd  or  to  a  Dirac  delta  distribution,  and 
the  latter  case  cannot  occur  when  such  a  sequence  “minimizes”  a  distance  from  some  function  in 

It  follows  directly  from  the  definitions  that  each  approximatively  compact  subset  is  proximinal. 

Corollary  3.2  For  all  n,d  positive  integers,  span^iJ^  is  a  proximinal  subset  of  (£p([0,  l]d),  ||.||p) 
with  p  G  [1,  oc). 

Thus,  for  any  fixed  number  n,  a  function  in  £p([ 0,  l]rf)  has  a  best  approximation  among  functions 
computable  by  a  linear  combination  of  n  characteristic  functions  of  half-spaces. 

4  Uniqueness  and  continuity  of  a  best  approximation 

Let  M  be  a  subset  of  a  normed  linear  space  (X,  ||.||)  and  let  V(M)  denote  the  set  of  all  subsets  of 
M.  The  set-valued  mapping  PM  :  X  ->  V(M)  defined  by  Pm{S)  =  {g  G  M  :  ||/  -  g ||  =  ||/  -  M\\) 
is  called  the  metric  projection  of  X  onto  M  and  Pm{/)  is  called  the  projection  of  f  onto  M, 

Let  F  :  X  — >  V(M)  be  a  set- valued  mapping.  A  selection  from  F  is  a  mapping  (j) :  X  — »  M  such 
that  for  all  /  €  X,  </>(f)  G  F(f).  A  mapping  (j)  :  X  — +  M  is  called  a  best  approximation  operator 
from  X  to  M  if  it  is  a  selection  from  Pm- 

When  M  is  proximinal,  then  Pm(/)  is  non-empty  for  all  /  G  X  and  so  there  exists  a  best 
approximation  mapping  from  X  to  M.  The  best  approximation  need  not  be  unique.  When  it  is 
unique,  M  is  called  a  Chebyshev  set  (or  “unicity”  set).  Thus  M  is  Chebyshev  if  for  all  /  G  X  the 
projection  PmU)  is  a  singleton. 

Recall  that  a  normed  linear  space  (X,  ||.||)  is  called  strictly  convex  (also  called  “rotund”)  if  for 
all  /  7^  g  in  X  with  ||/||  =  ||p||  =  1  we  have  ||(/  +  p)/2||  <  1.  It  is  well  known  that  for  all  p  G  (1,  oo), 
(£p([0,  l]d),  ||. ||p)  is  strictly  convex. 

The  following  theorem  from  [13]  implies  for  p  in  the  open  interval  (1,  oo)  that  if  among  best 
approximations  to  span nHd  (the  existence  of  which  is  guaranteed  by  Corollary  3.2)  there  is  a  con¬ 
tinuous  one,  then  spa nnHd  must  be  a  Chebyshev  set. 

Theorem  4.1  In  a  strictly  convex  normed  linear  space,  any  subset  with  a  continuous  selection 
from  its  metric  projection  is  Chebyshev. 

We  shall  combine  this  theorem  with  the  following  geometric  characterization  of  Chebyshev  sets 
with  a  continuous  best  approximation  from  [24]. 

Theorem  4.2  In  a  Banach  space  with  strictly  convex  dual,  every  Chebyshev  subset  with  continuous 
metric  projection  is  convex. 

It  is  well  known  that  £p-spaces  with  p  G  (l,oo)  satisfy  the  assumptions  of  this  theorem  (since 
the  dual  of  Cp  is  Cq  where  1  /p+  1/q  =  1  and  q  G  (l,oo))  (see,  e.g.,  [7],  p.  160).  Hence,  to  show  the 
non-existence  of  a  continuous  selection,  it  is  sufficient  to  verify  that  span^P^  is  not  convex. 

Proposition  4.3  For  all  n,  d  positive  integers,  span nHd  is  not  convex. 

Indeed,  consider  2n  parallel  half-spaces  with  the  characteristic  functions  #f  (x)  =  $(v  •  x  +  6*), 
where  0  >  b\  >  . . .  >  b^n  >  —1  and  v  =  (1, 0,  •  *  * ,  0)  G  7Zd.  Then  |  9i  a  convex  combination 
of  two  elements  of  span^iJ^,  X^=i  9i  and  5^*=n+i  ^ ' is'  not  in  span nHd,  since  its  restriction 

to  the  one-dimensional  set  {(£,0, . . .  ,0)  G  lZd  :  t  G  [0, 1]}  has  2 n  discontinuities. 

Summarizing  results  of  this  section  and  the  previous  one,  we  get  the  following  corollary. 
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Corollary  4.4  In  (£p([0,  l]d),  ||.||p)  with  p  €  (1,  oo)  for  all  n,d  positive  integers  there  exists  a  best 
approximation  mapping  from  Cv([ 0,  l]d)  to  span nHd,  but  no  such  mapping  is  continuous. 

Thus  convenient  properties  of  projection  operators  such  as  uniqueness  and  continuity  are  not 
satisfied  by  span nHd-  These  properties  would  allow  one  to  estimate  worst-case  errors  using  methods 
of  algebraic  topology  (see,  e.g.,  [6]).  In  linear  approximation  theory,  application  of  such  methods 
shows  that  some  sets  of  functions  defined  by  smoothness  conditions  exhibit  the  curse  of  dimension¬ 
ality:  the  approximants  converge  at  rate  0(1/  tfn),  where  d  is  the  number  of  variables  and  n  is  the 
dimension  of  the  approximating  linear  space  (see,  e.g.,  [20]).  Our  results  show  that  these  arguments 
are  not  applicable  to  approximation  by  spannTfd- 

5  Rates  of  approximation 

Let  (X,  ||.||)  be  a  normed  linear  space  and  G  be  its  subset,  then  G-variation  (variation  with  respect 
to  G)  is  defined  as  the  Minkowski  functional  of  the  set  cl  conv  (G  U  —  G),  i.e., 

\\f\\G  =  inf{c  €  11+  :  f/ce  cl conv(GU  -G)}. 

Variation  with  respect  to  G  is  a  norm  on  the  subspace  {f  e  X  :  ||/||g  <  oo}  Q  X.  The  closure  in 
its  definition  depends  on  the  topology  induced  on  X  by  the  norm  ||.||.  When  X  is  finite-dimensional, 
G- variation  does  not  depend  on  the  choice  of  a  norm  on  X ,  since  all  norms  on  a  finite-dimensional 
space  are  topologically  equivalent. 

Variation  with  respect  to  G  has  been  introduced  in  [17]  as  an  extension  of  the  concept  from  [1] 
of  iJd-variation  called  variation  with  respect  to  half-spaces.  For  functions  of  one  variable,  variation 
with  respect  to  half-spaces  coincides,  up  to  a  constant,  with  the  notion  of  total  variation  studied  in 
integration  theory  (see  [1]).  For  G  countable  orthonormal,  it  coincides  with  Zi-norm  with  respect  to 
G  (see  [18]). 

The  following  theorem  from  [17]  is  a  reformulation  of  Maurey-Jones-Barron  Theorem  (see  [2], 
[10],  [21])  on  estimates  of  rates  of  approximation  of  the  order  of  0(l/y/n). 

Theorem  5.1  Let  ( X ,  ||.||)  be  a  Hilbert  space,  G  be  its  subset  and  sg  =  supff€Gj|p||.  Then  for  every 
f  G  X  and  for  every  positive  integer  n, 

Corollary  5.2  For  all  positive  integers  d,n  and  for  every  f  e  (^([0,  l]d,  ||.||2), 

||/  -  span„i?d||2  < 

Thus  worst-case  error  in  approximation  of  functions  from  the  unit  ball  in  Hd- variation  by  linear 
combinations  of  characteristic  functions  of  n  half-spaces  of  [0,  l]d  is  at  most  1  /y/n.  Estimates  derived 
from  Theorem  5.1  are  sometimes  called  “dimension-independent”,  which  is  misleading  since  with 
increasing  number  of  variables,  the  condition  of  being  in  the  unit  ball  in  G-variation  becomes 
more  and  more  constraining.  See  [19]  for  examples  of  smooth  functions  with  Hd- variation  growing 
exponentially  with  the  number  of  variables  d.  However,  such  exponentially  growing  lower  bounds 
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on  variation  with  respect  to  half-spaces  are  merely  lower  bounds  on  upper  bounds  on  rates  of 
approximation  by  spann#d,  they  do  not  prove  that  such  functions  cannot  be  approximated  with 
faster  rates  than  \\f\\Hd/y/n.  Finding  whether  these  exponentially  large  upper  bounds  are  tight 
seems  to  be  a  difficult  task  related  to  some  open  problems  in  the  theory  of  complexity  of  Boolean 
circuits. 

Some  insight  into  behavior  of  Hd- variation  gives  its  geometric  characterization  derived  in  [19] 
using  the  Hahn-Banach  Theorem. 

Theorem  5.3  Let  (X ,  ||.||)  be  a  Hilbert  space  and  G  be  its  nonempty  subset.  Then  for  every  f  e  X, 
II/IIg  =  supft€S  where  S  =  {heX~Gx  :  ||fc||  =  1}. 

0<=G 

Thus  functions  that  are  “almost  orthogonal”  to  Hd  (i.e.,  have  small  inner  products  with  char¬ 
acteristic  functions  of  half-spaces)  have  large  ^-variation. 

6  Integral  representation 

The  following  theorem  from  [14]  shows  that  a  smooth  real- valued  function  on  7 Zd  with  compact 
support  can  be  represented  as  an  integral  combination  of  characteristic  functions  of  half-spaces.  By 
H~b  is  denoted  the  half-space  {x  €  1Zd  :  e  «  x  -f  b  <  0}. 

Theorem  6.1  Let  d  be  a  positive  integer  and  let  f  :  Hd  — »  7 Z  be  compactly  supported  and  2-times 
continuously  differentiable.  Then 


where  for  d  odd 


/(x)  =  /  w/(e,  &)$(e  •  x  +  b)dedb , 

Jst-ixTi 

w/(e,b)  =  ad  f  Akdf(y)dy, 


kd  =  (d  +  l)/%  o>nd  ad  is  a  constant  independent  of  f,  while  for  d  even, 


Wf(e,b)-ad[  A*'"/(y)o'(e  •  y  +  b)dy. 
Jh-, 


where  a(t)  =  —t  log  |t|  + 1  for  t  /  0  and  a(0)  =  0,  kd  =  (d  +  2)/2,  and  ad  is  a  constant  independent 
off. 

The  assumption  that  /  is  compactly  supported  can  be  replaced  by  the  weaker  assumption  that  / 
Vanishes  sufficiently  rapidly  at  infinity.  The  integral  representation  also  applies  to  certain  nonsmooth 
functions  that  generate  tempered  distributions. 

By  an  approach  reminiscent  of  Radon  transform  but  based  directly  on  distributional  techniques 
from  Courant  and  Hilbert  [4],  it  was  shown  in  [11]  that  if  /  is  compactly  supported  function  on  7 Zd 
with  continuous  d-th  order  partial  derivatives,  where  d  is  odd,  then  /  can  be  represented  as 


/(x)  =  /  fr)#(e  •  x  -f  b)dedb} 

JSd~lxTl 
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where  Vf  =  ad  JH  b(D^  /)(y)dy,  aa  =  (— l)fc  1(l/2)(2n)1  d  for  d  =  2&  +  1,  is  the  directional 
derivative  of  /  in  the  direction  e  iterated  d  times,  de  is  the  (d  —  l)-dimensional  volume  element  on 
5d~1,  and  dy  is  likewise  on  a  hyperplane.  Although  the  coefficients  Vf  are  obtained  by  integration 
over  hyperplanes,  while  the  wj  arise  from  integration  over  half-spaces,  these  coefficients  can  be 
shown  to  coincide  by  an  application  of  the  Divergence  Theorem  [3]  p.423  to  the  half-spaces  H~b. 
Theorem  6.1  extends  the  representation  of  [11]  to  even  values  for  d  and  target  functions  /  which 
are  not  compactly  supported  but  which  decrease  sufficiently  rapidly  at  infinity. 

For  w  6  C\(Sd~l  x  1Z)  and  /  G  V(IZd)  define 

T#(w)(x)  =  /  iu(e,  5)$(e  •  x  +  b)dedb, 

SH(f)(e,b)=wf(e,b). 

Theorem  6.1  shows  that  for  each  /  G  V(TZd),  T#  (£#(/))  =  /•  This  theorem  can  be  also  used  to 
estimate  variation  with  respect  to  half-spaces  by  the  £i-norm  of  the  weighting  function  wj  =  Vf.lt" 
is  shown  in  [11]  that  for  any  /  to  which  the  above  representation  applies, 

li/11%  <  /  \wf(e,b)\dedb. 

Jsd~1x'JZd 

Combining  this  upper  bound  on  if <*- variation  with  Corollary  5.2,  we  get  a  smoothness  condition 
that  defines  sets  of  functions  that  can  be  approximated  by  spannif^  with  rates  of  the  order  of  l/y/n. 
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Abstract 

In  this  paper  we  present  a  use  of  splines  in  the  biomedical  field. 


1  Introduction 

In  the  surgical  field  of  ophthalmology,  refractive  surgery  has  experiencied  an  important 
expansion  for  about  fifteen  years.  It  allows  the  surgeons  to  correct  different  refractive 
errors  (myopia,  hyperopia,  astigmatism)  aiming  to  decrease  or  minimize  the  use  of  optical 
equipments  such  as  glasses  and  lenses.  Many  surgical  techniques  are  today  available 
for  experts;  with  specific  indications  for  each  of  them.  Development  of  these  methods 
commonly  takes  time  and  requires  many  research  studies  on  animals  before  any  clinical 
approach.  In  overall,  abacus  are  established  for  all  procedures.  They  provide  to  the 
surgeon  some  rules  for  the  achievement  of  the  surgery.  These  nomograms  are  usually 
based  on  statistical  analysis  of  first  wide  series  of  operated  patients.  However,  up  to 
now,  no  technique  is  able  to  take  into  account  individual  variability  of  eyes  (morphology, 
physiology). 

The  purpose  of  the  present  article  is  to  consider  this  parameter  in  building  a  3  dimen¬ 
sional  numerical  model  of  the  eye  and  then  applying  to  it  various  simulations  of  surgical 
techniques  in  order  to  measure  their  effects. 
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2  Eye  and  vision 

2.1  The  eye  anatomy 

Schematically  the  eye-ball  has  quite  a  spherical  shape  with  a  vertical  diameter  (ap¬ 
proximately  23  mm)  and  an  antero-posterior  of  2  mm  longer  (axial  length).  Its  average 
volume  is  6.5  cm3  for  a  weight  of  7  grams. 

2.2  Refractive  errors 

When  parallels  rays  reach  a  normal  eye,  they  are  refracted  and  converge  without  accom¬ 
modation  on  the  retina  (called  emmetropia).  Errors  of  refraction  come  from  a  disparity 
between  the  refractive  capacity  of  the  anterior  segment  of  the  eye  and  the  length  of  the 
eye;  the  light  rays  are  no  longer  focus  on  the  retina.  This  is  called  ametropia,  and  is 
mainly  of  three  types;  myopia,  hyperopia,  astigmatism. 

3  Correction  of  ametropia 

3.1  Optical  equipment 

Glasses  or  lenses  represent  the  traditional  method.  Glasses  are  safe  and  reversible  for 
correction  of  most  refractive  errors  but  they  can  be  responsible  for  visual  field  reduc¬ 
tion  and  prismatic  aberrations.  They  can  also  be  a  source  of  discomfort  and  cosmetic 
impairment  for  the  wearer.  Contact  lenses  have  solved  most  of  the  problems  associated 
with  glasses,  but  require  very  strict  hygiene  to  avoid  severe  complications.  Refractive 
surgeries  can  bring  an  answer  to  these  various  problems. 

3.2  Refractive  surgery 

Many  techniques  are  available  today  in  refractive  surgeries.  Most  of  them  plan  to  re¬ 
shape  the  cornea  using  of  an  excimer  laser  (193  nm).  This  laser  (emitting  in  far  UV)  is 
used  in  two  distinct  surgeries; 

—  The  Photo  Refractive  Keratomileusis  (PRK) 

—  Laser  Assisted  In  Situ  Keratomileusis  (LASIK). 

The  PRK  technique  removes  cornea  tissue  on  its  surface  in  breaking  molecular  bindings. 
The  depth  and  size  of  the  ablation  is  determined  as  a  function  of  the  attempted  correc¬ 
tion.  In  LASIK  the  ablation  is  performed  after'  the  cut  of  a  thin  cornea  flap  (160  pm). 
This  flap  is  replaced  on  the  area  of  stromal  ablation.  In  general  PRK  is  used  for  correc¬ 
tion  of  low  ametropia  and  LASIK  for  low  and  medium  corrections.  For  height  corrections 
other  concepts  have  been  developed  (additive  surgery). 

4  Data  acquisition 

In  order  to  reconstruct  the  eyeball  in  3D,  data  from  the  eye  under  consideration  are 
needed.  Numerous  modalities  allow  us  to  obtain  information  about  the  eye  anatomy. 

4.1  Ultrasound 

Ultrasound  scan  uses  ultrasound  waves  for  investigating  human  tissues  in  vivo.  Nowadays 
in  ophthalmology  it  is  a  routine  exam  for  the  posterior  segment  of  the  eye,  especially 
for  the  research  of  foreign  intra-ocular  body.  Reasons  for  this  intense  use  are  multiple, 
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including  non  invasive  procedure,  speed  and  low  cost.  But  problems  remain,  which  define 
the  current  limits  ultrasound.  Multiple  phenomena  of  reflection  (between  two  internal  in¬ 
terfaces,  or  between  an  interface  and  a  transducer  itself)  create  false  echos.  Inaccuracies 
quickly  increase  with  the  deepening  of  the  investigation  because  of  all  sources  of  “back¬ 
ground  noise”,  such  as  diffraction,  diffusion  and  refraction.  Advantages  of  ultrasound 
allowed  us  to  use  it  without  constraint  to  obtain  maximum  image  quality.  Our  first  work 
was  to  set  up  an  images  acquisition  protocol  of  quality.  The  protocol  privileged  the  un¬ 
derwater  method  to  obtain  a  good  acoustic  coupling  between  the  probe  and  the  eye.  The 
patient  is  lying  on  his  back,  he  is  wearing  on  his  face  a  submarine  mask  without  pane. 
This  mask  is  filled  with  physiological  serum.  The  probe,  equipped  with  a  lighting  target, 
is  plunged  into  the  liquid.  The  patient  fixes  the  target,  in  such  conditions  the  provided 
images  are  along  the  optical  axis.  The  operator  turns  this  probe  manually  and  regularly 
around  the  optical  axis,  and  obtains  a  volume  of  data.  A  computer  equipped  with  an 
image  acquisition  board  can  save  all  images  on  an  hard  disk.  The  images  resolution  is 
dependent  on  the  probe  and  on  the  frequency  of  the  ultrasound  used. 

4.2  MRI 

The  MRI,  which  tries  to  localize  hydrogen  pits  by  measuring  their  magnetization,  real¬ 
izes  a  real  grey  scale  cartography  of  the  proton  concentration  of  the  various  examined 
structures  [1].  The  resultant  data  volume  has  a  dependent  acquisition  time  resolution, 
which  currently  represents  one  of  the  main  important  limitations  of  this  technique.  Be¬ 
sides  the  big  quality  of  images  obtained,  the  MRI  has  probably  no  harmful  effect  because 
it  does  not  use  ionisants  beams. 

4.3  Computerized  corneal  topography 

The  anterior  surface  of  the  cornea  is  one  fundamental  element  of  the  refraction.  Any 
modification  or  abnormality  of  this  surface  modifies  the  visual  acuity.  So  the  knowledge 
of  this  shape  is  extremely  important.  In  a  traditional  way  the  J aval’s  keratometer  is  used 
to  know  punctually  the  refractive  power  of  the  cornea.  In  the  last  few  years  ophthalmo¬ 
logists  have  become  used  to  another  system,  computerized  corneal  topography  [2].  This 
technique,  based  on  the  reflection  and  the  analysis  of  the  Placido’s  discs  deformation, 
allows  us  to  obtain  numerous  data  on  the  topology  of  the  cornea.  The  curvature  of  the 
cornea  is  represented  on  a  colored  map. 

4.4  Visible  Human  images 

The  images  of  the  Visible  Human  project  (the  photographic  modality)  have  great  space 
resolution.  They  allow  us  to  make  reconstruction  tests  without  acquisition  problems. 

5  Data  segmentation 

The  purpose  of  this  section  is  to  addign  a  weight  to  each  pixel  of  the  image.  The  greater 
the  weight  the  greater  the  contribution  of  this  pixel  to  the  reconstruction  of  the  edge 
will  be. 
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5.1  Pretreatments 

Little  pretreatment  were  done  on  the  images  under  various  modality.  The  speckle  filtering 
or  the  use  of  enhancement  contrast  filter  have  a  sure  visual  action  but  the  reconstruction 
does  not  seem  to  be  affected  in  our  specific  case.  The  only  pretreatment  used  is  an 
overlooked  one.  The  ophthalmologist  places  four  points  on  each  image  to  isolate  the  lens 
and  hence  helps  the  treatment  filters. 

5.2  Treatments 

To  affect  a  weight  to  each  pixel  of  the  image,  numerous  edge  detection  filters  were  tested, 
using  different  methods,  LOG,  Canny-Deriche,  Shen-Castan,  and  the  operator  based  on 
the  geometrical  moments.  The  most  convincing  results  were  obtained  with  the  Canny 
operator.  It  has  been  created  as  the  solution  of  an  optimization  problem  with  constraints 
[3].  This  filter  is  supposed  to  be  an  optimal  compromise  between  the  following  criteria: 
localization,  detection  and  unicity.  We  have  to  note  that  this  filter  is  optimized  for  images 
flooded  in  a  white,  Gaussian,  additive  noise;  and  it  is  not  the  case  in  most  of  the  used 
data. 

This  filter  is  actually  one  of  the  references  in  the  edge  detection  for  its  quality  of 
results;  it  is  regularly  used  in  the  literature  to  the  evaluation  of  new  filters.  A  recursive 
implementation  of  this  operator  was  developed  by  [4]  allowing  an  important  performance 
gain.  The  third  dimension  filter  is  obtained  by  supposing  the  filter  separable  and  by 
making  a  convolution  product.  This  choice  is  an  easy  one  but  it  introduces  anisotropies. 
These  results  images  are  difficult  to  use,  and  as  recommended  by  [5],  we  extract  its  local 
maxima.  This  method  consists  in  estimating  the  gradient  direction  and  only  keeping  its 
watershed. 

5.3  Post  treatments 

The  previous  stages  can  be  applied  to  any  type  of  images  without  taking  into  account 
their  contents.  Two  post-treatments  types  are  presented  to  take  into  account  peculiarities 
of  the  eye  contents.  The  first  post-treatment  consists  to  take  into  account  ultrasound 
sound  images  and  MRI  particularities.  The  center  of  the  eye  have  got  no  edges  and 
generally  the  first  visible  edge  is  the  good  one.  The  “visible  human”  project  images  [10] 
have  specifics  characteristics.  They  are  in  fact  photos  of  frozen  tissues;  crystals  of  ice 
are  clearly  visible  in  the  vitrous,  while  it  is  uniform  in  the  other  modalities.  A  simple 
threshold  is  ineffective.  The  hysteresis  threshold  ,  introduced  by  [3]  takes  into  account 
the  edges  connexity  and  luminance  (levels  of  grey)  and  give  us  good  results  on  such 
images. 

6  Eyeball  rebuilding  with  splines 

The  most  used  techniques  for  edge  reconstruction  on  medical  prints  are  snakes  (active 
contour  models)  [7,8].  A  shape  approaching  the  organ  to  be  reconstructed  is  initialized, 
then  deformed  locally  to  fit  the  data.  These  deformations  use,  generally,  physical  prop¬ 
erties  of  elasticity  materials.  These  various  methods  allow  the  organ  edge  reconstruction 
of  varied  forms  as  bones,  heart,  brain,  etc..  This  type  of  reconstruction  is  effective  but 
numerous  parameters  must  be  set.  We  opted  for  a  different  technique.  The  edge  to  be 
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reconstructed  in  our  case  is  a  quasi-spherical  shape,  and  we  reconstruct  it  by  using  B- 
splines.  Their  mathematical  properties  allow  us  to  reconstruct  the  edge  in  a  effective  and 
fast  way,  and  with  adjusting  only  few  parameters. 

6.1  Principle 

For  a  B-spline  (ID)  on  R  =  [a,  6]  we  have  to  set: 

—  the  degree  k  of  the  spline, 

—  the  position  and  numbers  of  the  knots  (A i,i  —  0,  ...,p  +  1), 

—  the  coefficients  Ci  of  the  spline  representation: 

9 

«(•■>')  =  Yh  °iNi,k+\{x), 

i——k 

where  Ni^+\{x)  is  the  B-spline  basis  function. 

We  have  chosen  to  set  the  degree  of  the  spline  to  3.  Tests  indicate  this  is  a  good  com¬ 
promise  between  computer  time  and  result  quality.  The  other  parameter  determination 
depends  on  the  approximation  criteria  used  and  the  position  of  control  knots. 

6.1.1  The  Dierckx  criteria  [6] 

The  Dierckx  approximation  criteria  determine  a  spline  like  the  solution  of  a  constrained 
minimization  problem: 

minimize 

n:=^(5W(Ai+)-5W(Ai-))2 
*=  1 

with  the  constraint 

m 

6  ■■=  53  (Wr(yr  -  s(x r)))2  <  S 
r=l 

where  (a;r,  yr)  are  the  coordinates  of  the  m  data  points,  with  wr  the  associated  weight. 

6.1.2  Control  knots  number 

As  the  number  of  control  knots  becomes  important,  the  smoothness  of  the  curvature 
decreases.  Using  that  property  we  set  up  an  iterative  algorithm  to  perform  the  calculation 
of  the  spline.  After  an  initialization  with  few  control  knots  (we  set  for  example  Ao  =  a, 
A 2  =  6  and  Ai  =  (a  4-  b)/ 2),  the  spline  is  computed.  If  the  smoothness  is  too  important 
(with  the  £  estimation)  we  add  some  control  knots  and  we  start  again  the  estimation  of 
the  smoothness.  In  the  other  case  we  stop  the  algorithm.  At  each  iteration  we  can  insert 
one  or  more  control  knots.  The  distribution  of  the  control  knots  is  recomputed  for  each 
iteration.  They  can  be  linearly  distributed  over  R  =  [a,  6]. 

This  method  can  be  generalized  to  surfaces  without  difficulty  (see  [6])  using  spherical 
coordinates  and  periodic  boundary  conditions. 
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6.2  Results 

Different  results  are  presented  either  in  3d  or  2d  view.  In  2d  view,  the  spline  is  drawn 
in  red,  and  represents  the  intersection  of  the  2d  spline  and  the  data  volume.  The  main 
reconstruction  errors  are  due  to  segmentation  errors.  But  the  more  data  the  better,  and 
the  quality  of  the  reconstruction  needs  to  be  good.  The  reconstruction  of  images  issued 
from  visible  human  (22  slices)  is  better  than  from  the  MRI  (8  slices)  and  the  ultrasound 
images  (4  slices). 
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Fig.  3.  Reconstruction  using  ultra  sound  images. 


7  Elastic  modelisation  of  surgery 

7.1  Method  used 

The  finite  elements  method  is  used  to  simulate  surgery  and  solve  the  elasticity  problem. 
Actually  the  knowledge  of  the  comportment  law  of  the  eye  ball  tissues  is  the  main 
limitation  of  this  problem.  Literature  reports  a  wide  range  of  coefficients  to  describe  these 
tissues.  In  fact  they  seem  to  have  an  individual  variability.  So  we  use  the  approximation 
[9]  for  the  elasticity  coefficients  which  uses  three  parameters,  internal  pressure,  radius  of 
the  eye  and  width  of  the  edge.  The  use  of  complex  models  does  not  offer  much  information 
because  of  the  low  precision  of  the  data  that  we  used. 

7.2  Results 

Numerous  simulations  have  been  done.  Results  seem  good  in  spite  of  the  comportment 
law  and  the  duration  of  the  finite  element  method.  The  result  are  represented  with  a 
color  map  of  the  eye  representing  the  curvature  radius  like  the  ophthalmologist  does. 


Fig.  4.  Excimer  Simulation  before  (left)  and  after  (right). 
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8  Conclusion 

This  article  presents  a  very  modular  path  to  realize  modelisation  of  refractive  surgeries. 
Each  part  of  this  work  can  be  independently  modified  and  can  be  adapted  to  an  other 
organ.  All  this  work  has  been  validated  by  ophthalmologists.  The  eye  ball  reconstruction 
using  spline  appears  to  be  an  efficient  method  with  a  low  CPU  time.  The  mechanical 
modelisation  provides  proper  results  despite  several  approximations.  This  study  might 
be  useful  for  the  medical  doctor  but  also  for  testing  new  surgical  techniques. 
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Abstract 

The  least  absolute  deviations  criterion,  or  the  i\  norm,  is  frequently  used  for  approx¬ 
imation  where  the  data  may  contain  outliers  or  ‘wild  points’.  One  of  the  most  popular 
methods  for  solving  the  least  absolute  deviations  data  fitting  problem  is  the  Barrodale 
and  Roberts  (BR)  algorithm  (1973),  which  is  based  on  linear  programming  techniques 
and  the  use  of  a  modified  simplex  method  [1].  This  algorithm  is  particularly  efficient. 
However,  since  it  is  based  upon  the  simplex  method  it  can  be  susceptible  to  the  accu¬ 
mulation  of  unrecoverable  rounding  errors  caused  by  using  an  inappropriate  pivot.  In 
this  paper  we  shall  show  how  we  can  extend  a  numerically  stable  form  of  the  simplex 
method  to  the  special  case  of  t\  approximation  whilst  still  maintaining  the  efficiency  of 
the  Barrodale  and  Roberts  algorithm.  This  extension  is  achieved  by  using  the  Zi  char¬ 
acterization  to  rebuild  the  relevant  parts  of  the  simplex  tableau  at  each  iteration.  The 
advantage  of  this  approach  is  demonstrated  most  effectively  when  the  observation  matrix 
of  the  approximation  problem  is  sparse,  as  in  the  case  when  using  compactly  supported 
basis  functions  such  as  B-splines.  Under  these  circumstances  the  new  method  is  consid¬ 
erably  more  efficient  than  the  Barrodale  and  Roberts  algorithm  as  well  as  being  more 
robust. 


1  Introduction 


Given  a  set  of  m  data  points  {(:r;,2/,)};=ii  the  (j ,  or  least  absolute  deviations  curve-fitting 
problem  seeks  c  £  IR"  to  solve  the  optimization  problem 


min  Hy-Tlcll, 

C 


E 

»=i 


Vi 


y  ]  ai,jCj 

1 


i=i 


(i.i) 


where  A  is  an  m  x  n  observation  matrix,  and  r*  denotes  the  residual  of  the  ?‘th  point. 

Another  way  of  stating  the  ii,  or  least  absolute  deviations  curve- fitting  problem,  is  by 
the  characterization  theory  of  an  Z\  solution  [8],  which  may  be  given  in  different  forms. 
The  following  is  perhaps  the  most  commonly  used. 
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A  vector  c  6  Hn  solves  the  minimization  problem  (1.1)  if  and  only  if  there  exist 


A  €  !Rm  such  that 

AtX  =  0  with  {  ^  ,  v 

\  Xi  =  sign(n), 

where  Z  represents  the  set  of  indices  for  which  r*  =  0. 

for  i  e  Z, 
for  i  $  Z, 

(1.2) 

One  of  the  popular  methods  designed  for  solving  the  i\  approximation  problem  is  the 
Barrodale  and  Roberts  (BR)  algorithm.  It  replaces  the  unconstrained  variables  c  and  r 
in  (1.1)  by  nonnegative  variables  c+,  c~,  u  and  v,  and  considers  the  linear  programming 
problem 

min  eTu  +  eTv 

c 

subject  to  Ac+  —  Ac~  +  u  —  v  —  y ,  (1.3) 

c+,e“,  u,  v  >  0. 

Much  of  the  reason  for  the  popularity  of  the  BR  algorithm  is  that  it  exploits  the 
characteristics  of  the  t\  approximation  in  order  to  solve  the  problem  in  a  more  efficient 
manner  than  the  general  simplex  approach.  However,  it  is  a  simplex  based  method,  and 
so  it  is  susceptible  to  numerical  instabilities  caused  by  using  inappropriate  pivots.  The 
new  method  presented  here  uses  matrix  factorization  instead  of  simplex  pivoting.  This 
approach  allows  numerically  stable  updates  to  be  made,  thus  avoiding  the  unnecessary 
build-up  of  rounding  errors.  This  method  is  particularly  efficient  when  the  observation 
matrix  is  large  and  sparse  [5]. 

Bartels  [2]  and  Gill  and  Murray  [4]  presented  methods  that  concentrate  on  avoiding 
the  inherent  instability  of  the  simplex  method.  However,  these  methods  are  designed  for 
a  general  linear  programming  problem  and  if  we  were  to  employ  these  techniques  for  the 
special  case  of  the  t\  problem,  the  storage  requirements  and  computational  workload  of 
the  method  would  be  unnecessarily  large  compared  to  those  of  the  highly  efficient  BR 
algorithm. 

The  problem  is,  in  essence,  an  interpolation  problem.  The  aim  of  any  iterative 
procedure  for  the  i\  problem  is  to  find  an  optimal  set  of  interpolation  points.  Indeed, 
this  is  how  the  BR  algorithm  solves  the  t\  problem.  It  begins  with  all  coefficients,  c, 
set  to  zero  (being  non-basic  variables),  and  during  each  iteration  of  stage  one,  one  of 
the  residuals,  r*,  becomes  non-basic  by  making  the  corresponding  point  an  interpolation 
point  (i.e.,  the  coefficients  are  altered  so  that  n  =  0).  At  the  end  of  stage  one,  the  current 
estimate  interpolates  n  distinct  points.  During  stage  two,  the  interpolation  points  are 
exchanged  one  at  a  time  with  a  non-interpolation  point  until  an  optimal  solution  is 
achieved. 

In  fact,  the  new  algorithm  is  effectively  identical  to  the  BR  algorithm  in  the  sense  that 
we  use  exactly  the  same  pivoting  strategy.  However,  we  start  with  a  predetermined  set 
of  interpolation  points  and  do  not  store  the  simplex  tableau  directly.  In  each  iteration, 
we  only  reconstruct  the  parts  of  the  simplex  tableau  that  are  needed  by  the  more  stable 
approach  employed. 
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2  A  more  stable  computational  approach 

The  linear  programming  presentation  of  a  least  absolute  deviations  curve-fitting  problem 
is  given  in  (1.3).  It  is  a  standard  linear  programming  problem  of  dimension  mx  (2m+2n). 
The  robust  approaches  of  Bartels  and  Gill  and  Murray  can  be  applied  to  solve  it.  They 
involve  the  factorization  of  an  m  x  m  matrix.  On  the  other  hand,  the  BR  algorithm  only 
deals  with  an  m  x  n  matrix  in  each  iteration,  if  m  n,  the  direct  usage  of  these  stable 
approaches  is  less  efficient.  We  shall  show  next  that  the  factorization  of  an  n  x  n  matrix 
is  all  that  is  required  at  each  iteration. 

We  split  the  data  points  based  on  the  set  interpolation  Z ,  and  let  Az ,  yz,  u z  and 
vz  be  the  counterparts  of  A,  y,  u  and  v  in  (1.3)  corresponding  to  the  set  Z.  Their 
complementary  matrix  and  vectors  are  denoted  by  Az  and  yz,  uz  and  vz,  so  that  Az 
and  Az  comprise  A,  etc.,  problem  (1.3)  can  be  expressed  as 

min  eT(uz  +  «z)  +  eT(vz  +  vz) 

c 

subject  to  Azc+  ~  Azc~  +uz-vz  =  yz ,  . 

Azc+  -  Azc~  +uz  -  vz  =yz,  '  ' 

c*  )C~  ,uz,uz,vz,vz  >0. 

Since  the  coefficients  for  cj  are  just  the  negative  of  the  coefficients  for  Cj  ,  j  =  1, 2, .  1 . ,  n, 
it  is  possible  to  suppress  cj  and  let  c  represent  the  unconstrained  variable.  The  initial 
simplex  tableau  associated  with  problem  (2.1)  can  be  constructed  in  matrix  form  by 
Table  1,  where  e*,  k  =  m,  n,  m  —  n,  are  k  x  1  vectors  with  all  components  equal  to  one. 


BV 

c 

Uz 

uz 

Vz 

Vz 

r 

uz 

Az 

I 

0 

-I 

0 

yz 

Uz 

Az 

0 

I 

0 

-I 

yz 

Z 

*(£) 

0 

0 

-2e£ 

—2eT 

m— n 

4(g) 

Tab.  1.  The  initial  simplex  tableau  of  the  i\  fitting  problem. 


As  we  know,  the  simplex  method  is  an  iterative  procedure  in  which  each  iteration  is 
characterized  by  specifying  which  m  of  2m  +  n  variables  are  basic.  For  the  l\  approxima¬ 
tion,  we  are  only  concerned  with  those  vertices  which  are  formed  by  a  set  of  interpolation 
points.  For  n  interpolation  points,  the  basic  variables  consist  of  n  of  the  coefficient  para¬ 
meters  c  and  m  —  n  of  the  parameters  uz  corresponding  to  the  non-interpolation  points. 

Let  B  be  the  mxm  basis  matrix  whose  columns  consist  of  the  m  columns  associated 
with  the  basic  variables.  Then 
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Tab.  2.  The  condensed  simplex  tableau  associated  with  a  set  of  interpolation 
points. 


It  is  readily  verified  that  the  inverse  of  B  can  be  written  in  the  form  of  (2.3)  as  long 
as  Az  is  invertible. 


Equation (2. 3)  shows  that  the  explicit  inverse  computation  of  an  m  x  m  matrix  in  the 
form  of  (2.2)  can  be  achieved  by  dealing  with  an  inverse  of  an  n  x  n  matrix,  and  in 
general,  n  m. 

To  make  the  m  non-basic  variables  become  basic,  we  multiply  the  whole  simplex 
tableau  by  I?-1,  and  omit  the  identity  and  zero  matrices.  Then  new  simplex  tableau  is 
given  in  Table  2. 

An  arbitrary  choice  of  the  interpolation  set  Z  may  cause  some  of  the  values  in  the 
right  hand  side  column  to  become  negative.  Although  it  is  permissible  for  the  coefficient 
parameters  c  to  be  negative,  for  those  rows  having  negative  residuals  rz,  we  restore 
feasibility  by  exchanging  the  corresponding  uz  for  vz-  This  exchanging  can  be  made  by 
subtracting  twice  those  rows  from  the  objective  row  and  changing  the  sign  of  the  original 
rows  [1]. 

Such  an  exchange  process  can  be  expressed  in  matrix  terms  by  introducing  a  sign 
vector 

A  z  =  sign(rz). 

Let  Azs  represent  the  matrix  which  is  obtained  by  multiplying  those  rows  of  Az  asso¬ 
ciated  with  negative  residuals  by  —1, 


Azs  =  di&g(\z)Az- 


Tab.  3.  Restoration  of  feasibility  of  the  simplex  tableau. 


The  simplex  tableau  after  restoring  feasibility  is  shown  in  Table  3. 

The  point  to  be  removed  from  Z  is  decided  by  the  values  of  the  objective  row.  Each 
time  the  maximum  value  of  the  objective  row  (including  the  suppressed  columns)  is 
chosen,  we  let  the  index  of  this  element  be  k .  In  order  to  choose  which  new  point  is 
to  join  the  set  Z ,  we  compute  the  value  of  the  pivotal  column,  the  kth  column  in  the 
simplex  tableau.  Since  the  simplex  tableau  is  in  the  form  of 


the  kth  column  can  be  obtained  by  using  Azs  and  the  kth  column  of  Azl . 

The  BR  algorithm  pivoting  strategy  is  adopted  to  decide  which  new  point  is  to  be 
added  to  the  interpolation  set,  when  a  new  set  of  indices  Z  is  generated.  We  repeat  the 
process  in  an  iterative  manner  until  the  optimal  solution  is  achieved. 

Table  3  is  in  fact  identical  to  the  simplex  tableau  of  the  BR  algorithm  in  stage  2. 
The  difference  here  is  that  the  BR  algorithm  is  implemented  by  a  simplex  pivoting 
approach,  while  the  transformation  of  the  simplex  tableau  in  the  form  of  Table  3  can  be 
accomplished  in  a  numerically  more  stable  manner. 

3  The  improved  method 

The  improved  method  starts  with  a  predetermined  interpolation  set  Z,  the  minimum 
requirement  for  Z  being  that  it  forms  a  well-behaved  matrix  Az .  For  B-spline  basis  func¬ 
tions,  we  can  choose  any  set  of  points  satisfying  the  Schoenberg- Whitney  condition  [6]. 
For  a  Chebyshev  polynomial  basis,  points  close  to  the  n  Chebyshev  zeros  can  be  regarded 
as  the  initial  interpolation  set.  In  other  cases,  we  can  choose  points  approximate  to  them 
or  even  uniformly  distributed. 

If  we  denote  the  set  of  A*,  i  €  Z,  as  A^,  we  can  rewrite  the  characterization  equa¬ 
tion  (1.2)  as 

A'zXz  =  —A^X  z7 

and  Xz  can  be  obtained  mathematically  from 


(3.1) 
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A z  —  ~(^z)  1{^z^z)-  (3*2) 

Table  3  shows  that  the  objective  row  can  be  computed  as 

Objective  row  =  —(A \Az)Ac£  —  e^.  (3.3) 

Thus,  using  (3.2)  we  conclude  that 

Objective  row  =  A 5  ~  en  •  (3.4) 


We  know  that  at  the  t\  solution  all  the  values  in  the  objective  row  are  in  the  range 
[-2,  0],  and  also  |A|  <1.  This  latter  result  can  be  explained  in  terms  of  the  former  by 
the  relationship  (3.4). 

(3.4)  is  useful  because  it  can  be  used  to  verify  whether  an  interpolation  set  forms  an 
optimal  solution,  or  to  compute  A  from  the  values  of  the  objective  row.  We  use  it  to 
compute  the  values  of  the  objective  row. 

The  improved  method  can  be  summarized  as  follows; 

(1)  Choose  an  initial  set  of  interpolation  points  and  form  the  set  Z. 

(2)  Construct  Az ,  yz  and  their  counterpart  Az,  yz  accordingly. 

(3)  Solve  the  equation  Azc  =  yz  for  c,  and  compute 

rz  =  yz-  Azc ,  and  Xz  -  sign(fz). 

(4)  Obtain  the  values  of  A z  from  the  equation 

AzXz  =  —  A^Xz-  (3-5) 

(5)  If  \Xz\  <  1  hold,  the  current  solution  is  optimal,  and  the  algorithm  terminates. 
Otherwise,  continue. 

(6)  Obtain  the  objective  row  of  the  BR  simplex  tableau  from 

objective  row  =  X^  —  e^. 

(7)  Examine  the  values  of  the  objective  row;  the  point  associated  with  the  maximum 
value  of  the  objective  row  is  chosen  to  leave  the  set  Z. 

(8)  Decide  the  point  to  add  by  the  BR  pivoting  strategy.  Obtain  a  new  set  of  indices 
Z ,  and  repeat  from  step  2. 

4  Practical  considerations  and  application  to  the  spline  ap¬ 
proximation 

The  robustness  of  the  above  algorithm  stems  from  the  reliable  updating  of  the  relevant 
parts  of  the  simplex  tableau  in  each  iteration.  The  major  computational  work  is  obtaining 
(explicitly  or  implicitly)  the  inverse  of  an  n  x  n  matrix  Az-  It  can  be  calculated  and 
stored  explicitly  by  using  an  LU  or  QR  factorization,  or  preferably  it  can  be  expressed  as 
a  product  of  factors.  Since  Az  differs  from  its  predecessor  by  only  one  row,  savings  can 
be  made  by  reusing  results  from  the  previous  step.  Necessary  material  is  available  [4,  7] 
regarding  the  stable  implementation  of  this  row  updating  procedure. 
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m  =  512 

Numbers  of  iterations 

Execution  Time  (seconds) 

Q  = 

New 

BR 

New 

BR 

44 

57 

125 

1.6 

14.7 

49 

75 

111 

2.2 

13.4 

54 

71 

134 

2.4 

20.2 

59 

83 

156 

3.0 

26.8 

64 

78 

160 

3.1 

32/ 

69 

88 

194 

4.0 

42.4 

74 

75 

165 

3.7 

36.0 

79 

87 

189 

4.8 

48.1 

Tab.  4.  The  number  of  iterations  and  execution  time  taken  by  the  algorithm  of 
this  paper  and  the  Barrodale  and  Roberts  algorithm  for  a  set  of  512  response  data 
points  provided  by  the  National  Physical  Laboratory. 


Sparsity  almost  always  is  more  important  than  matrix  dimension.  Additional  savings 
can  be  made  if  the  observation  matrix  A  is  sparse  or  structured.  Approximation  using 
a  B-spline  basis  often  occurs  in  practical  applications.  In  such  cases,  A  is  block  banded, 
and  Az  can  be  triangularized  using  0(n)  flops  [3].  Similarly,  the  sparsity  of  A  can  be 
exploited  to  compute  other  relevant  parts  of  the  simplex  tableau  efficiently. 

We  have  applied  our  method  to  solve  the  least  absolute  deviations  curve-fitting  prob¬ 
lems  by  B-splines  using  various  numbers  of  interior  knots.  All  software  was  written  in 
MATLAB  and  implemented  on  a  Sun  Workstation.  The  initial  interpolation  points  are 
chosen  to  be  those  points  corresponding  to  the  maximum  value  in  each  column  of  the 
observation  matrix  A. 

Some  of  our  computational  results  are  reported  in  Tables  4  and  5.  Each  table  presents 
the  outcomes  of  a  particular  set  of  data  points  by  the  new  method  and  by  the  BR 
algorithm. 

All  the  experimental  results  exhibit  the  effectiveness  of  the  improved  method  on  large, 
sparse  systems.  Although  these  tables  show  that  the  improved  method  is  faster  than  the 
BR  algorithm,  it  would  be  unfair  to  judge  the  convergence  speed  purely  based  upon 
the  time  taken,  since  the  improved  method  embodies  some  MATLAB  built-in  functions, 
while  the  BR  algorithm  uses  only  user-defined  functions.  However,  on  average,  the  new 
method  requires  far  fewer  iterations  than  the  BR  algorithm,  and  is  competitive  with  the 
BR  algorithm  both  in  efficiency  and  accuracy  for  a  structured  system. 

Further  work  to  be  addressed  by  the  authors  will  involve  a  definitive  implementation 
of  this  algorithm  in  Fortran,  and  development  of  an  error  analysis  for  both  the  improved 
method  and  the  BR  algorithm. 
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m  =  1200 

Numbers  of  iterations 

Execution  Time  (seconds) 

< ?  = 

New 

BR 

New 

BR 

50 

82 

143 

4.0 

58.7 

56 

105 

165 

5.2 

85.8 

62 

113 

190 

6.1 

110.2 

68 

131 

189 

7.6 

110.4 

74 

121 

223 

7.8 

157.9 

80 

132 

216 

9.2 

163.2 

86 

155 

245 

11.8 

209.8 

92 

173 

252 

14.0 

241.8 

98 

153 

272 

13.6 

292.6 

Tab.  5.  The  number  of  iterations  and  execution  time  taken  by  the  algorithm 
of  this  paper  and  the  Barrodale  and  Roberts  algorithm  for  a  set  of  1200  data 
points,  generated  by  MATLAB  command  x  =  linspace(l,  10, 1200)';  y  =  log(z)  + 
randn(1200, 1). 
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Abstract 

Tomography  is  well  known  because  of  its  many  applications.  Although  theoretically 
solved,  the  numerical  implementation  of  tomographic  reconstruction  algorithms  is  still  a 
difficult  problem.  In  this  article  the  numerical  implementation  of  a  reconstruction  method 
using  Cesaro-means  and  Newman-Shapiro  operators  is  described.  The  key  point  herein 
is  the  use  of  suitable  quadrature  formulae  on  the  sphere.  It  turns  out  that  in  the  context 
described  product  Gaussian  formulae  are  best  suited.  The  algorithm  is  tested  at  the  so 
called  Shepp-Logan  phantom  which  is  a  three  dimensional  model  of  a  human  head. 

1  Introduction  and  notation 

The  problem  in  tomography  is  to  reconstruct  a  function  F  from  its  Radon  transform  suf¬ 
ficiently  well.  Since  certain  classes  of  functions  can  be  expanded  into  series  of  orthogonal 
polynomials  it  is  essential  to  exploit  the  action  of  the  Radon  transform  on  orthogonal 
polynomials  and  on  polynomials  in  general. 

This  approach  is  the  more  interesting  since  the  inverse  of  the  Radon  transform  for 
polynomials  is  known  explicitly. 

The  convergence  of  orthogonal  expansions  to  the  given  function  is  often  achieved  only 
by  applying  a  summability  method.  The  application  of  such  methods  can  be  interpreted 
as  a  kind  of  “filter  technique”  which  is  necessary  for  sufficiently  good  reconstructions.  The 
combination  of  an  expansion  of  the  function  and  the  application  of  suitable  summability 
methods  leads  to  promising  reconstruction  algorithms. 

In  this  article  two  examples  for  a  summability  method  and  their  implementation  are 
presented  —  the  Cesaro-means  and  Newman-Shapiro-means.  After  some  introductory 
remarks  on  Laplace-series  at  the  end  of  this  section,  in  Section  2  the  theory  of  sum¬ 
mability  methods  needed  here  is  presented.  In  Section  3  this  theory  is  applied  to  the 
reconstruction  of  functions  from  their  Radon  transform.  Section  4  describes  the  nu¬ 
merical  implementation  of  the  reconstruction  formula  which  is  tested  on  the  so  called 
Shepp-Logan  phantom  of  a  head  in  Section  5. 

In  this  article  the  following  notation  is  used.  Let  Br  denote  the  unit  ball  in  JRr ,  5r_1 
denote  the  unit  sphere  and  Zr  :=  [—1,1]  x  5r_1.  xy  denotes  the  Euclidean  product  of 
x,y  €  Ftr. 
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The  spaces  of  restrictions  of  r-variate  polynomials,  homogeneous  polynomials  and 
homogeneous  harmonic  polynomials  of  degree  p  6  No  onto  a  subspace  X  C  Mr  (X  = 

5r_1  or  X  =  Br )  are  denoted  by  IPJl(X ),  ( X ),  (X),  respectively.  The  space 

C(Sr~1)  of  all  continuously  differentiable  functions  is  provided  with  the  inner  product 
<  F,G  >:=  /5r-i  F(x)G(x)dx.  The  surface  measure  of  the  sphere  is  denoted  by  uv_i  ~< 
1,1>. 


Let  C £  denote  the  Gegenbauer  polynomials  of  degree  p  and  index  A  and  C*  = 
Cfc/Cp(l)  the  normalized  Gegenbauer  polynomials.  The  reproducing  kernel  function  of 

'  —  2 

- •  C^2  (xy),  the  normalized  reproducing 

U)r-l 


JH^  (Sr  *)  is  given  by  G^xy)  =  —  -- 


kernel  is  defined  by  G M  :=  G^/G^iT). 


Let  Y  e  {C(Sr~1),L2(Sr~1),  Lp(5r-1)}.  For  /  e  Y  let 


OO  OC  „ 

L(f, *)  =  £  (M  (*)  =  E  /  f{y)G„{xy)dy 

“n  r.JSr-1 


be  the  Laplace-series  of  /,  where  (A^/)^)  :=  /5r-i  f(y)Gu(x,y)dy  is  the  orthogonal 
* 

projection  of  /  onto  Ml  (5r_1)  and  the  partial  sums  L^(f,x)  —  J2„=o  (&vf)(%)  are  the 
orthogonal  projections  of  /  onto  P£(5r_1). 

Whereas  for  Y  =  L2(Sr_1)  it  is  known  that  the  partial  sums  L^(f,x)  converge  to  / 
in  norm,  no  convergence  is  obtained  for  Y  =  G(£r_1)  or  Y  =  Lp(5r_1)  for  p  >  2  H - 

2  r  "  2 
and  p  <  2  —  -  (see  e.g.  [l]p.211).  Applying  a  summability  method  the  situation  changes. 


2  Summability  methods 

Let  A  =  be  an  infinite  matrix  for  which  the  elements  a^v  e  M  fulfil  the 

following  properties. 


(i)  =  0  for  v  >  p, 

(ii)  limM_>oo  =  1  for  v  €  {0, 1}, 


(iii)  K M(f)  >  0  for  -1  <  f  <  1,  where  :=  J2u=o  a^Gv. 


If  with  the  aid  of  a  summability  method  the  kernel  Gv  in  (1.1)  is  substituted  by  a  kernel 

ft/*  =  UuyGy  (2.1) 

v=0 

then  the  operator  defined  by  the  transformed  series 

LA(f,x)=  lim  f  f(y)Kfi(x,y)dy  (2.2) 

t-OO  Jgr-1 

can  be  shown  to  converge  pointwise  to  the  identity  provided  that  for  the  kernel  K M  the 
properties  (i)— (iii)  of  the  matrix  A  are  valid. 
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Remark  2.1  The  coefficients  a^u  can  be  obtained  from 


<V„  =  (L*Gv(t.))(t)  =  j  G„{tx)Ktl{tx)(ko{x),  t€  Sr  1 . 


For  A  being  the  matrix  of  the  Cesaro-means  the  proof  was  given  by  Kogbetlianz  [4] 
first.  Berens  et  al.  [1]  give  a  proof  for  Cesaro-means  as  well  as  for  Abel-Poisson-means. 
They  also  prove  results  on  the  order  of  convergence  and  the  corresponding  saturation 
classes.  The  convergence  proof  for  Newman-Shapiro  operators  (Y  =  C(Sr'~1))  can  be 
found  in  Reimer  [7]. 

2.1  Cesaro-means 

For  Cesaro-means  the  coefficients  a in  the  summability  method  have  to  be  chosen  as 

^  (*  +  1),  (1U  ’ 

where  (p)q  =  p  •  (p+ 1)  * . . .  •  (p+ q  —  1)  denotes  the  Pochammer  symbol.  Then  the  kernels 
Kp  in  (iii)  take  on  the  form 

ts  _  Wm  \^(kAl)n-Uri  (n  A\ 

K“~T^w,h~ 1  ' 

Convergence  of  the  transformed  Laplace-series  (2.2)  is  valid  for  k  >  (r— 2)/2;  for  A;  >  r— 1 
the  operators  even  are  positive  (see  Kogbetlianz  [4]). 

2.2  Newman-Shapiro  summability  method 

In  [8]  Reimer  considers  kernel  polynomials 


K2v+l(£)  :=  K2v(Q  •—  9v+ 1  7 


as  used  by  Newman-Shapiro  [5].  Here,  r}u+\  is  the  largest  root  of  Gu+\  and 


/  on  1  ~  vl+i  (v+r-2\  1  1 

=  (r-2K-i-(27T7F(  >-3  )  =“ 


-  vUi  i 
2is  4*  r  Gv-fi(l) 


The  coefficients  in  the  Newman-Shapiro  operators  can  be  calculated  to  be 

(2i/  +  r)2  (J  +  A)(/  +  A) 

^  =  9v+l'hh^^)2'  (^+1))2  ‘  ^ 


$ZtZr+1r.  (GAv»+i)Y 

miy0  (A)fc  (Xh-k  (A ),-t  (l)J+i-»  (2A)j-fj_fc 

(l)fc  (l)j-fc  (2A)j+(-2*  (A+ l)j+i-fc  v'3+l 


where  Su,j+i~-2k  denotes  the  Kronecker  delta  and  A  =  2-?p. 

The  matrix  A  defined  by  the  Newman-Shapiro  operators  fulfils  the  properties  (i)— (iii) 
(see  Reimer  [8]). 
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Remark  2.2  The  corresponding  partial  sum  operators  L £  are  nonnegative  with  positive 
a pu.  For  continuous  and  differentiable  functions  even  more  is  valid  (see  Reimer  [8]): 
whereas  for  continuous  functions  the  approximation  error  is  of  order  0(/x-1)  ,  functions 
F  e  Cj(Sr~1),  j  G  {1,2},  have  an  error  of  order  0(fi~j). 

3  Application  to  tomography 

The  Radon  transform  7 Z  :  C(Br)  — ►  C(Zr)  is  defined  by 

(KF){s,t)  :=  j  F(st  +  v)dv,  FeC{Br),  ( s,t)  e  Zr ,  (3.1) 

wit 

v2<l -a2 

which  means  that  the  Radon  transform  R  of  F  is  determined  by  integrating  F  over  all 
hyperplanes  of  dimension  r  —  1.  This  map  can  also  be  defined  for  functions  in  L1(jRr), 
P2(Pr),  the  Schwartz  space  S(lRr)  or  some  Sobolev  spaces.  7 Z  is  continuous  on  all  of 
these  spaces,  whereas  the  inverse  R~x  is  only  continuous  on  S(JRr)  and  on  the  Sobolev 
spaces. 

For  polynomials  it  is  known  that 

(RC$(a.)){s,t)  =  <5j(s)c|(at),  a  G  Sr^\  (s,t)  G  (3.2) 

(see  Davison,  Griinbaum  [2])  and,  more  generally, 

(npm){s,t)  =  cl{s)pm(t),  ( s,t)ezr ,  (3.3) 

* 

where  the  polynomials  Pm  eJPp  (5r“1)  are  generated  by  the  Gegenbauer  polynomials, 

.  r 

i.e.  ^~-Cp  (ax)  =  J2\m\=fx  a7nPm(x)-  These  polynomials  Pm,  \m\  =  /x,  are  known  to 
constitute  a  basis  for  JP^  (Sr~1). 

'  T. 

Let  Vp  :=  span{Pm  :  \m\  =  pi}.  Since  the  Gegenbauer  polynomials  CJ  can  also  be 

* 

interpreted  as  the  reproducing  kernel  of  JP^+2(5r+1),  the  orthogonal  projection  Fv  of 

F  G  C(Br)  onto  V(f(Br)  can  be  identified  with  the  orthogonal  projection  of  F  onto 
* 

JHfff2{Sr+1)  (see  Reimer  [7]  for  details).  Thus  the  theory  of  Laplace  series  can  be  used 
here  for  the  reconstruction  of  F  from  its  Radon  transform. 

Let  A  be  a  matrix  transformation  as  introduced  in  Section  2  and  let  Fv  be  the 
orthogonal  projection  of  F  onto  V*(Br).  Then  according  to  the  summability  theory 
of  Laplace  series  F  —  lim^oo  Ylu-0  a^Fu.  Since  the  Radon  tranform  is  linear  and 
continuous  there  is  RF  =  lim^oo  J2u-o  o>niff^Fv. 


It  can 

be  shown  that  (see  Reimer  [7]j 

Fu(x)  =  X U}r  -----  f  (RF)(s,t)Cj?  (s)Cj  (tx)d(s,t), 
r  —  1  Jzr 

(3.4) 

where 

_  (,  -  1)CJ(1)  [',  ^  _  2,  +  r 

UJr-l  *  U)r—2  J- 1  V  / 

(3.5) 
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^From  this,  after  some  lengthy  calculation  using  the  adjoint  operator  of  'll  (which  essen¬ 
tially  is  the  inverse  operator  of  7 £),  the  reconstruction  formula  follows 


Because  of  the  identification  of  the  orthogonal  projection  of  F  onto  V£(Br)  and  onto 
* 

Hl,+2(Sr+1)i  convergence  of  the  Cesaro-means  follows  for  k  >  r/2 ,  and  positivity  of 
the  operators  is  valid  for  k  >  r  +  1.  For  the  same  reason  the  coefficients  in  the 
Newman-Shapiro  summability  method  have  to  be  calculated  for  A  = 

4  Numerical  implementation 

For  the  reconstruction  of  F  formula  (3.6)  was  used.  As  soon  as  the  Radon  transform  of 
F  is  known,  the  numerical  implementation  in  principle  reduces  to  a  stable  evaluation 
of  the  Gegenbauer  polynomials  and  a  suitable  approximation  of  the  integrals  in  (3.6). 
The  Gegenbauer  polynomials  were  evaluated  by  their  recurrence  relation  (see  Szego  [11]) 
which  is  known  to  be  numerically  very  stable.  The  coefficients  a^u  for  the  Cesaro-means 
and  the  Newman-Shapiro  operators  were  computed  with  the  aid  of  formula  (2.3)  and 
(2.7),  respectively.  The  factor  A^  was  obtained  by  (3.5).  Since  the  calculation  of  l/  for 
the  Newman-Shapiro  operators  is  very  time  consuming  (more  than  10  hours  for  p  >  100) 
these  coefficients  were  stored  before  the  main  computation  was  started. 

Since  the  integrand  in  (3.6)  is  a  polynomial  of  degree  v  +  2  with  respect  to  s  (see 
(3.6)  together  with  (5.1)),  ..ds  was  approximated  by  a  Gaussian-Legendre  quadrat¬ 
ure  of  degree  \ij2  +  1.  This  choice  ensures  that  for  the  evaluation  of  HF($}t)  enough 
evaluations  with  respect  to  s  are  performed  and  that  the  integral  is  evaluated  exactly 
within  numerical  precision. 

For  the  quadrature  on  Sr~ 1  first  an  interpolatory  quadrature  as  introduced  in  [6]p.l32 
was  used.  The  weights  of  such  a  quadrature  formula  are  obtained  as  solutions  of  a  linear 

system  of  equations  GA  =  e,  where  e  =  (l,...,l)r  6  N  =  dim  2F£  (Sr_:l), 
A  =  (Ai, . . . ,  An)t  the  vector  of  weights  and 

G  =  -^-(Clixjxk)  +  C^ixjxk))^. 

U/r— 1 

The  points  were  chosen  to  be  regularly  distributed  on  latitudes  of  the  sphere. 

For  (i  >  70  in  the  computation  of  the  weights  computational  problems  occured  be¬ 
cause  of  a  lack  of  memory.  Apart  from  this  problem,  several  weights  turned  out  to  be 
negative  which  led  to  oscillations  of  the  reconstruction.  Therefore,  this  interpolatory 
quadrature  was  substituted  by  a  product-Gauss  formula  for  the  sphere  Sr~l  as  sug¬ 
gested  by  Stroud  [10]p.  41.  The  points  and  weights  of  the  Gaussian  quadrature  were 
computed  by  the  MATLAB  program  qrule.m  which  is  available  via  internet  from  the 
Mathworks  Inc.  The  number  of  points  of  the  product  Gauss  formula  is  N  =  2 Mr“1 
where  M  =  fi/ 2  +  1  is  the  number  of  points  used  in  each  direction,  i.e.  N  =  2 M2  for 
r  =  3. 


K,V  [  [  (KF)  (s,  t)C3  {s)Cd  (tx)dsdt.  (3.6) 

Jsr- *  J- 1 
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All  codes  for  computation  were  written  in  MATLAB  6.  The  actual  computation 
took  place  on  a  SUN  UltralO  with  256  MB  main  memory,  691  MB  virtual  memory  and 
SUN  OS  operating  system  release  5.7.  To  increase  the  computatinal  speed  all  parts  of  the 
MATLAB  code  were  written  with  as  few  for-loops  as  possible.  This  gave  an  improvement 
in  speed  of  a  factor  >  500. 

5  Computational  results 

The  theoretical  results  have  been  applied  to  the  so  called  Shepp-Logan  phantom  which 
is  usually  used  as  a  test  function  for  tomographic  reconstrution  algorithms.  It  is  a  three 
dimensional  model  of  a  human  head  consisting  of  10  ellipsoids  (see  Shepp  [9])  which  were 
shrinked  here  to  fit  into  the  unit  sphere  S2.  Figure  1  shows  a  cut  at  x3  =  0.2721. 

Let  3  =  1, .  •  •  >  10,  denote  the  axes  of  the  j-th  ellipsoid,  denote  its 

density  value  and  —  s[^  the  diameter  of  the  ellipsoid  in  the  direction  of  £  E  S2.  Since 
the  Radon  transform  is  linear,  the  Radon  transform  of  the  Shepp-Logan  phantom  can 
be  calculated  to  be 

10  /  (?)  (i) 

KF(s,t)  =  -  s)  (-2 .  --1- ■ 

Figure  2  shows  the  reconstruction  results  according  to  formula  (3.6)  for  Cesaro-means 
of  index  k  =  4  and  for  Newman-Shapiro  operators. 

The  values  k  =  1.6  and  k  =  2  were  tested,  too, 
but  for  high  degrees  of  //  no  convergent  beha¬ 
viour  could  be  observed. 

For  Cesaro-means  with  k  =  4  and  for  Newman- 
Shapiro  operators  Figure  2  clearly  shows  an  im¬ 
proving  behaviour  of  the  reconstructions  for  in¬ 
creasing  p. 

The  Newman-Shapiro  operators  show  a  better 
convergence  and  for  p  >  150  even  the  small 
structures  in  the  original  head  can  be  detected 
in  the  reconstruction.  It  can  be  expected  that 
for  higher  degrees  of  p  this  behaviour  will  be¬ 
come  more  evident. 


Unfortunately,  for  p  >  170  the  computation  of  the  coefficients  for  the  Newman- 
Shapiro  operators  caused  some  numerical  problems  so  that  the  calculations  were  stopped 
with  p  =  160.  Although  the  numerical  results  look  quite  promising,  the  drawback  in  the 
reconstruction  is  the  computational  time.  For  p  =  160  the  computation  took  27.5  hours 
for  the  Radon  transform  and  31  hours  for  the  evaluation  at  the  points  x  E  [—1,  l]2.  The 
evaluation  was  done  on  an  equidistant  grid  of  200  x  200  points. 


-1  -0.5  0  0.5  1 

Fig.  1.  Shepp-Logan  phantom. 
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|i  =40 


^  =100 


|a  =160 


p.  =40  fi  =100  ji  =160 


Fig.  2.  reconstruction  of  the  Shepp-Logan  phantom. 

In  principle  there  is  no  problem  to  produce  three  dimensional  reconstructions.  The  eval¬ 
uation  points  x  only  have  to  be  chosen  from  a  grid  in  [ —  1 ,  l]3 .  Because  of  the  time 
consuming  calculations  this  was  not  done  here,  yet. 
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Abstract 

We  present  a  unified  approach  to  fast  algorithms  of  various  discrete  trigonometric  trans¬ 
forms.  With  the  help  of  so-called  Euler  formulas  we  describe  an  elegant  and  useful  con¬ 
nection  between  Fourier  matrices  and  trigonometric  matrices.  It  is  known  that  FFTs 
are  closely  related  to  the  factorizations  of  the  unitary  Fourier  matrix  into  a  product  of 
unitary  sparse  matrices.  Using  these  Euler  formulas  and  FFTs,  we  obtain  fast  algorithms 
of  discrete  trigonometric  transforms.  As  a  further  consequence  of  these  Euler  formulas 
and  Gaussian  sums,  we  compute  all  eigenvalues  of  some  trigonometric  matrices. 


1  Introduction. 

The  fast  Fourier  transform  (FFT)  and  related  algorithms  for  orthogonal  trigonometric 
transforms  are  essential  tools  for  practical  computations.  Special  discrete  trigonomet¬ 
ric  transforms  are  the  discrete  Hartley  transforms  (DHT),  discrete  cosine  transforms 
(DCT),  and  the  discrete  sine  transforms  (DST)  of  various  types.  These  transforms  have 
found  important  applications  in  approximation  methods  with  Chebyshev  polynomials, 
quadrature  methods  of  Clenshaw-Curtis  type  (see  [3]),  signal  processing,  and  image 
compression  (see  [4,  6,  9]). 

Euler  formulas  describe  the  algebraic  connection  between  Fourier  matrices  of  a  cer¬ 
tain  type  and  corresponding  cosine  and  sine  matrices.  Using  these  formulas,  FFTs  can 
be  transformed  into  fast  and  stable  algorithms  for  the  DCT  and  DST.  Further,  from 
these  Euler  formulas  the  orthogonality  of  various  trigonometric  matrices  follows  imme¬ 
diately.  For  simplicity  we  consider  only  symmetric  trigonometric  matrices,  i.e.  Fourier 
and  Hartley  matrices  of  type  I  and  IV  as  well  as  cosine  and  sine  matrices  of  type  I,  IV, 
V  and  VIII. 

This  paper  is  organized  as  follows;  first  we  introduce  generalized  Fourier  matrices. 
New  Euler  formulas  for  these  matrices  describe  a  close  connection  with  various  ortho¬ 
gonal  Hartley,  cosine  and  sine  matrices.  These  results  simplify  and  extend  former  results 
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of  [9],  pp.  83-96.  Applying  these  Euler  formulas  and  FFTs,  we  obtain  fast  algorithms 
of  discrete  trigonometric  transforms.  As  a  further  consequence  of  these  formulas  and 
Gaussian  sums,  we  can  compute  all  eigenvalues  of  orthogonal  symmetric  trigonometric 
matrices. 

2  Euler  formulas  for  Fourier  matrices  of  type  I 

Let  N  >  2  be  a  given  integer.  The  Fourier  matrix  of  type  I  is  the  classical  Fourier  matrix 
defined  in  unitary  form 

jfi  1 

Fn  - 

with  un  :=  exp(— 27ri/AT).  Note  that  the  Gaussian  sum  (see  [5],  pp.  326-330)  yields  the 
trace  of  Fl: 


N~  i  r 


l  +  i* 
1  +  i 


Closely  related  with  type  I  Fourier  matrices  are  the  cosine  and  sine  matrices  of  types  I 
and  V: 


l~^~  f  N  N  jk7T\N 

\h^\£i£k  cos— — 

V  N  \  3  k  N  J  j,k= 0 


2  /_:_(j  +  l)(A+l)n"-2 

NV  N  )j,k= o 

_? _ /jv+iiv+i  _  W™ 


1  /  j,k—0  ’ 


.  2(j  +  l)(k  +  l)n\N-i 


+  1  )7T  \  •fv  —  i 
1  /  j,k=0 


Here  we  set  ef  :=  y/2/2  for  j  e  {0 ,  AT}  and  ef  :=  1  for  j  e  {1, . . . ,  N  -  1}.  In  this 
notation  a  subscript  of  a  matrix  denotes  the  order,  while  a  superscript  signifies  the  type  of 
the  matrix.  In  the  following,  7jv  denotes  the  identity  matrix  and  Jpr  the  counteridentity 
matrix,  which  has  the  columns  of  In  in  reverse  order.  Blanks  in  a  block  matrix  indicate 
blocks  of  zeros.  The  direct  sum  of  matrices  A ,  B  will  be  denoted  by  A  ©  B.  Defining  the 
orthogonal  matrices 


Pv 

5  *2JV+1-  — 


we  obtain  for  Fourier  matrices  of  type  I  the  following  Euler  formulas: 

Theorem  2.1  Depending  on  whether  the  order  of  the  Fourier  matrix  of  type  I  is  even 
or  odd,  we  have 


(^2n)T^2N^2? 


Qv+l  ©  (— 
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(-*2  AT+ 1  )T  $N+ 1  ^2N+ 1 


<#+i  ©  (“i)^v- 


(2.3) 


Proof: 


It  is  obvious  that  (i^v)Tl^;v  =  ^2iv-  Splitting  into  four  blocks 

jk  \N  /  j(N +k+l)\N,N—2 

2N/j1k=0  \U,2N  )j,k= 0 

(. N+j+l)k\N-2,N  /  (JV4j+l)(N+*+l)\tf-2 

2N  /  j,fc=0  ^W2JV  Jj,k= 0 


^  v^iv  (  (w 


and  using  the  classical  Euler  formula  exp(-ix)  =  cos  x  —  isinz,  we  obtain  (2.2)  by 
blockwise  computation  of  (J^nY^n^n-  The  proof  of  (2.3)  is  similar.  □ 


Remark  2.2  An  analogous  result  to  (2.2)  can  be  found  in  [9],  pp.  85-90,  but  with  a 
complex  matrix  instead  of  $N-  Compare  also  with  [1].  The  Euler  formula  (2.3)  is  new. 
Note  that  the  results  and  their  proofs  are  simpler  than  in  [9],  pp.  85-90  and  [lj. 
Corollary  2.3  The  matrices  C^+1,  Sj^_u  CJv+1,  are  orthogonal 
Proof:  Since  1$N  is  unitary  and  I$N  is  orthogonal,  C^+1®H)S^_!  is  unitary  by  (2.2). 
Hence  the  real  matrices  C^+1  and  S are  orthogonal.  Other  proofs  can  be  found  in 
[4],  pp.  12-16  and  [6]. 

The  proof  for  the  type  V  matrices  uses  (2.3)  and  follows  similar  lines.  □ 

Remark  2.4  Results  analogous  to  (2.2)  and  (2.3)  are  true  for  the  Hartley  matrix  of 
type  I  (see  [9],  pp.  77-80  and  [8],  pp.  224-227) 


Hi 


N 


VN\  N  )j,k= o 


with  cas#  :=  cos  a;  +  sin  a:.  Then  we  obtain  the  formulas 

(^2nY^2N^2N  —  Cn+1®Sn-  1)  (2-4) 

(4+l)%+l4+l  =  ^7V+1  ©  &N-  (2*5) 

The  Euler  formula  (2.2)  can  be  used  for  fast  and  numerically  stable  computations  of 
DCTs  and  DSTs  of  type  I:  Let  x  €  RN+l  and  y  €  RN~l  with  N  =  2*  (t  >  2)  and  set 

z  :=  f  x  \  e  R2N.  Since  J^Nz  is  real,  we  can  apply  Edson’s  algorithm  for  the  FFT  of 

'  y  * 

real  data  (see  [8],  pp.  215-223  and  [7]).  The  output  of  the  conjugate  even  result  is  in  the 
form  U2n^2n(^2Nz)  where  U2n (In+ i  ©  (-i)Iiv-i)  (i 2nY •  Therefore  by 


we  have  calculated  C^+1sc  and  simultaneously  using  5 Nt  flops. 

If  we  have  to  use  an  FFT  with  complex  data,  we  combine  real  data  vectors  sc,  sc'  € 

y  +  iy'  )  ’  Then  we  can 

compute  two  DCTs  C^+1x,  C&+1sc'  and  two  DSTs  S^tyf  simultaneously  via 

an  FFT  of  length  2N  applied  to  the  complex  input  vector  P2Nz*. 

In  a  similar  way,  the  Euler  formula  (2.3)  can  be  used  for  fast  computations  of  DCTs 
and  DSTs  of  type  V:  For  given  sc,  sc'  €  3RN+1  and  y,yf  €  the  transformed  vectors 
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C^+1x^C^+1x',S^y,  and  S^y'  can  be  calculated  at  the  same  time  as  components  of 
(^2Ar-f  i )T ^2N+i ^2Ar+ 1 z'  =  (C/v+r  ©  (— where  we  use  an  FFT  of  length  2N  +  1 
with  complex  data  F^n^z'.  If  2N  +  1  =  3*  or  more  generally,  if  2 N  H~  1  is  a  product  of 
small  primes  (see  [8],  pp.  76-101  and  [7])  the  FFT  of  length  2 N  +  1  can  be  computed 
very  efficiently. 


3  Euler  formulas  for  Fourier  matrices  of  type  IV 

The  Fourier  matrix  of  type  IV,  defined  by 


1  (  <2j+l)(2k+l)\N  1 

'“Tfr4"  4*=o 


is  related  to  the  Fourier  matrix  of  type  I  by  the  formula 

F™  =  wiNWNF*WN  (3.1) 

with  Wn  :=  diag^/v)^1  and  is  therefore  unitary.  If  N  is  a  power  of  2  or  3,  then 
can  be  factorized  into  a  product  of  sparse  unitary  matrices. 

Lemma  3.1  The  trace  of  the  Fourier  matrix  of  type  IV  is  equal  to 


N-l 

^  =  i£ 


i-i* 
l  +  i 


Proof:  We  begin  with  the  generalized  Gaussian  sum  (see  [5],  p.  330) 

1  2JV-1 

7f  54-v  =  1_i 

which  we  split  into  two  sums  containing  even  and  odd  j  respectively.  Then 

1  2AT-1  N—l  N—l  * 

77sf  £  W4JV  =  £  “N  +  7/ff  £  wiw+1)  =  tri^  +  tri$ 


and  the  results  follows  by  (2.1).  □ 

Now  we  introduce  cosine  and  sine  matrices  of  type  IV  and  VIII  which  are  closely 
related  with  the  Fourier  matrix  of  type  IV: 

CN  ,=  t/2~/  (2j  +  l)(2fc  +  !)7t\ 

N  '  ViVV  4  N  )j,k= o  ’ 


2  (2j  +  l)(2fc  +  1)tt\n-1 

V  Y  4JV  )  i,k-a  ’ 


(2j  +  l)(2A:  +  l)7r\^-i 


2(2  N  + 1) 


)2V  — 1 

j,fc=0 


jv+i  iv+i  (2j  +  l)(2fc  +  1)7T\ w 
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2(2  N  + 1) 
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As  above  we  define  orthogonal  matrices 

M 

y/2  \  ~~Jn  Jn  )  ’ 


0N 
2  N 


pvra 

■*2^4-1 


In 


■N 


y/2 


Jn  Jn 


Theorem  3.2  For  the  Fourier  matrix  of  type  IV  and  even  resp.  odd  order ,  we  obtain 
the  following  Euler  formulas: 


—  ©  (— i  )Stf  , 

=  QW®H)C- 


(3.3) 

(3.4) 
□ 


(*2 n)  %2N*2N 
( nVI  \T  pIV  pVl 

Proof:  Similar  to  that  of  Theorem  2.1. 

Corollary  3.3  The  matrices  C^, and  are  orthogonal 
Remark  3.4  An  analogous  result  to  (3.3)  can  be  found  in  [9],  pp.  94-96.  Compare  also 
with  [1].  Formula  (3.4)  is  new.  A  different  proof  of  the  orthogonality  of  C and  S™  can 
be  found  in  [6]. 

Remark  3.5  Similar  formulas  as  in  Theorem  3.2  are  true  for  the  Hartley  matrix  of 
type  IV  (see  [1,  2]) 


rrlV 

Mn 


1  (  (2j  +  l)(2/c  +  l)7T^-i 

VWvCaS  2  AT  )j,k= o' 


Then  we  have 


($n)T*$n$n  -  CffQSff,  (3.5) 

( pvm  nt  ttN  pvm  _  wvni  ^  cfW  /o 

\*2N+l)  ^hN+l-^N+l  —  MV  ^MV+l*  V°-D7 

The  Euler  formulas  can  be  used  for  a  fast  and  numerically  stable  computation  of 
DCT  and  DST  of  types  IV  and  VIII: 

Using  (3.3)  and  (3.1),  for  arbitrary  x>xr,y,y'  €  RN  the  DCTs  C$fx,  x'  and 
DSTs  S^y  and  S^y'  can  be  calculated  via  one  FFT  of  length  27V  with  complex  data 

F^zf  and  z'  :=  ^  y  +  fyf  )•  If  TV  =  2*,  this  procedure  requires  about  lONt  operations. 

Likewise  by  (3.4),  for  a,  ®'  6  €  RN+1  the  DCTs  of  type  VIII,  C^x,  C#V 

and  the  DSTs  can  be  calculated  via  one  FFT  of  length  27V  +  1  with 

complex  data  F2N+iz'- 

Remark  3.6  The  sine,  cosine,  Hartley,  and  Fourier  matrices  considered  above  enjoy 
the  interesting  intertwining  relations  (see  [2]): 


(3.7) 


with  the  diagonal  matrix  S^v+i  :=  diag((-l)fc)j£L0  and  the  reflection  matrix  := 
T©  Jjv-i.  Therefore  applying  (3.7)  in  the  above  algorithm,  it  is  also  possible  to  compute 


Cn+iJn+i 

—  Sjv+iC^+i, 

39. 

I 

I—* 

3* 

1 

CjfjN 

w 

II 

HlNfN 

—  Jn&Ni 

h%jn 

=  JnH%, 

-Fjy4v 

=  Jn^Ni 

F'n’Jn 

=  J*F&, 
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four  DCTs  (or  four  DSTs)  of  type  IV  and  order  N  via  one  FFT  of  length  2 N  with 
complex  data. 

4  Eigenvalues  of  trigonometric  matrices 

Finally  we  determine  the  eigenvalues  of  trigonometric  matrices  introduced  above.  Since 
the  cosine  and  sine  matrices  of  type  I,  IV,  V  and  VIII,  and  the  Hartley  matrices  of  type 
I  and  IV  are  real,  symmetric  and  orthogonal,  only  1  and  -1  are  possible  eigenvalues. 
For  x  e  R  we  denote  by  |®J  resp.  [af|  the  integer  k  €  Z  with  k  <  x  <  k  - hi  resp. 
k  —  1  <  x  <  k. 

Theorem  4.1  The  sine  and  cosine  matrices  C&,  Sjr,  ,  S™ ,  C$,  and  S™  of 

order  N  >  2  possess  the  eigenvalues  1  and  —1  with  multiplicities 

m(l)=\N/2l  ra(— 1)  =  [N/2\ . 


Proof:  Since  is  symmetric  and  orthogonal,  only  1  and  —1  can  be  eigenvalues.  Their 
multiplicities  fulfil 

m(l)  +  m(— 1)  =  N. 

On  the  other  hand,  since  and  S^_2  are  rea^  ^  follows  from  (2.2)  and  the  trace 
formula  (2.1)  that 


m(l)  -  m(— 1)  =  tr  =  Re(tr  i^IAr_2)  —  Re 


l  +  i2iV-2 

1  +  i 


1  for  odd  N , 
0  for  even  N. 


From  these  two  linear  equations  we  obtain  m(l)  =  \N/2]  and  m(— 1)  =  [7V/2J.  In  the 
other  cases,  the  proof  is  similar.  □ 


From  Theorem  4.1  and  the  Euler  formulas  (2.2)-(2.3)  and  (3.3)-(3.4)  it  follows  im¬ 
mediately: 

Corollary  4.2  The  Fourier  matrices  of  type  I  and  IV  have  only  eigenvalues  1,  —  1,  i,  —  i 
with  multiplicities: 
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W  21 

W  21 
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From  Theorem  4.1  and  formulas  (2.4)-(2.5)  and  (3.5)-(3.6)  it  follows: 

Corollary  4.3  The  Hartley  matrices  of  type  I  and  IV  have  only  eigenvalues  1  and  —  1 
with  the  following  multiplicities: 
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h}n+1 

ttN 

ttN 

**2N+1 

m(l) 

m(-l) 
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